DTrace Topics iSCSI

From Siwiki

Jump to: navigation, search

Contents

[edit] DTrace Topics: iSCSI

This article is about DTracing iSCSI, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.

DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems. iSCSI is a network protocol for accessing remote disk devices.

Completion Image:trafficlight_yellow02.png
Difficulty Image:coffeemug01.png Image:coffeemug01.png Image:coffeemug01.png
Audience Sysadmins, storage admins

[edit] iSCSI and DTrace

An iSCSI system involves an iSCSI target server which hosts the disks, and iSCSI clients which connect to the target server. Tracing iSCSI on the target server and the client is performed in different ways.

DTrace providers for iSCSI tracing:

Target DTrace Provider Description
iSCSI target server iscsi trace iSCSI events including data I/O
iSCSI target server pid trace raw activity from iscsitgtd
iSCSI client io trace data I/O events
iSCSI client fbt trace raw iSCSI kernel module activity

[edit] Target Server

An iSCSI target provider has been developed by Adam Leventhal at Sun. Adam posted a summary of the provider to dtrace-discuss, which this summary section is based on.

This provider as not yet available as of 27-Mar-2007. When it is publically available, this sentence will be replaced by instructions on how it can be fetched and installed.

The iscsi provider is a USDT provider built into the iSCSI target daemon (iscsitgtd). As a USDT provider the name is concatenated with the process ID of the daemon (e.g. iscis123).

Since this provider instruments the iSCSI target daemon, DTrace commands and scripts must be run on the iSCSI target server to observe these probes.

[edit] Probes

iSCSI Event Probes
SCSI command/response iscsi*:::scsi-command

iscsi*:::scsi-response

Data out/in/request (rtt) iscsi*:::data-send

iscsi*:::data-receive
iscsi*:::data-request

Login and logout command/response iscsi*:::login-command

iscsi*:::login-response
iscsi*:::logout-command
iscsi*:::logout-response

NOP out/in (pings) iscsi*:::nop-receive

iscsi*:::nop-send

Text and task command/response iscsi*:::task-command

iscsi*:::task-response
iscsi*:::text-command
iscsi*:::text-response

Asynchronous message from target iscsi*:::async-send

[edit] Arguments

Probes Variable Type DescriptionTarget Server
* args[0] conninfo_t * connection info
* args[1] iscsiinfo_t * common iSCSI properties
scsi-command args[2] iscsicmd_t * SCSI opcode

[edit] Types

conninfo_t

typedef struct conninfo {                                                       
        string ci_local;         /* local host IP address */               
        string ci_remote;        /* remote host IP address */     
        string ci_protocol;      /* protocol ("ipv4", "ipv6") */               
} conninfo_t;                                                                   

iscsiinfo_t

typedef struct iscsiinfo {
	string ii_target;		/* target iqn */
	string ii_initiator;		/* initiator iqn */
	uint64_t ii_lun;		/* target logical unit number */

	uint32_t ii_itt;		/* initiator task tag */
	uint32_t ii_ttt;		/* target transfer tag */

	uint32_t ii_cmdsn;		/* command sequence number */
	uint32_t ii_statsn;		/* status sequence number */
	uint32_t ii_datasn;		/* data sequence number */

	uint32_t ii_datalen;		/* length of data payload */
	uint32_t ii_flags;		/* probe-specific flags */
} iscsiinfo_t;                                                          

iscsicmd_t

typedef struct iscsicmd {
        uint64_t ic_len;        /* CDB length */
        uint8_t *ic_cdb;        /* CDB data */
} iscsicmd_t;

If the initiator or target IQNs are invalid (e.g. in login-command), they will be the empy string. If tag fields (ii_itt and ii_ttt) are invalid, they are set to 0xffffffff (the tag reserved by the iSCSI protocol). If the sequence numbers are invalid, they are set to 0.

[edit] Strategies

The following are suggested strategies for analysing iSCSI traffic:

  • Counting the frequency of iSCSI command types
    check for ratios that don't match expected workload (eg, iscsiwho.d).
  • Counting the frequency of iSCSI client IP addresses
    check who is accessing the target (eg, iscsiwho.d).
  • Watching iSCSI events live
    for event by event analysis (eg, iscsisnoop.d).
  • Measuring latencies for iSCSI data transfers
    latencies are crucial for performance analysis (eg, iscsirwlat.d).

[edit] One-Liners

The following one-liners are easy to use and may prove quite useful. For further analysis, see the example scripts.

  • Frequency of iSCSI command types,
dtrace -n 'iscsi*::: { @[probename] = count(); }'
  • Frequency of iSCSI client IP addresses,
dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'
  • Payload bytes by iSCSI command type,
dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'
  • Payload byte distribution by iSCSI command type,
dtrace -n 'iscsi*::: { @[probename] = quantize(args[1]->ii_datalen); }'

[edit] Examples

[edit] iscsiwho.d

This is a simple script to produce a report of the remote IP addresses and a count of iSCSI events. This is intended to provide a quick summary of iSCSI activity when run on the iSCSI target server.

#!/usr/sbin/dtrace -s
/*
 * iscsiwho.d - Report iSCSI client events. Solaris Nevada, DTrace.
 *
 * This traces requested iSCSI events when run on an iSCSI server.
 *
 * USAGE: iscsiwho.d            # Hit Ctrl-C to end
 *
 * FIELDS:
 *              REMOTE IP       IP address of the client
 *              iSCSI EVENT     iSCSI event type.
 *              COUNT           Number of events traced
 */

#pragma ident   "@(#)iscsiwho.d 1.3     07/03/27 SMI"

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

iscsi*:::
{
        @events[args[0]->ci_remote, probename] = count();
}

dtrace:::END
{
        printf("   %-26s %14s %8s\n", "REMOTE IP", "iSCSI EVENT", "COUNT");
        printa("   %-26s %14s %@8d\n", @events);
}

Example output,

# ./iscsiwho.d 
Tracing... Hit Ctrl-C to end.
^C
   REMOTE IP                     iSCSI EVENT    COUNT
   192.168.1.109                 nop-receive        1
   192.168.1.109                    nop-send        1
   192.168.1.109               scsi-response      126
   192.168.1.109                scsi-command      208
   192.168.1.109                data-receive     1207
   192.168.1.109                data-request     1207
   192.168.1.109                   data-send     1290

The above example shows normal traffic. For this simple script, these event names are not translated beyond their iSCSI provider probe names, and require some thought to comprehend (they are from the perspective of the iSCSI target server).

An interpretation for some of these events are:

iSCSI event Interpretation
scsi-command A SCSI command was issued, such as a read or a write. Use other scripts for a breakdown on the SCSI command type.
data-send Data was sent from the iSCSI target server to the client; the client is performing a read.
data-receive Data was received on the iSCSI target server from the client. The client is performing a write.

The iscsisnoop.d script does do some interepretation of event names.

[edit] iscsisnoop.d

This is a simple script to snoop iSCSI events from the target server, live. Some of the iSCSI event names are translated.

#!/usr/sbin/dtrace -s
/*
 * iscsisnoop.d - Snoop iSCSI events. Solaris Nevada, DTrace.
 *
 * This snoops iSCSI events when run on an iSCSI server.
 *
 * USAGE: iscsisnoop.d          # Hit Ctrl-C to end
 *
 * FIELDS:
 *              CPU             CPU event occured on
 *              REMOTE IP       IP address of the client
 *              EVENT           Data I/O event (data-send/data-receive)
 *              BYTES           Data bytes
 *              ITT             Initiator task tag
 *              SCSIOP          SCSI opcode as a description, as hex, or '-'
 *
 * NOTE: On multi-CPU servers output may not be in correct time order
 * (shuffled). A change in the CPU column is a hint that this happened.
 * If this is a problem, print an extra timestamp field and post sort.
 */

#pragma ident   "@(#)iscsisnoop.d       1.2     07/03/27 SMI"

#pragma D option quiet
#pragma D option switchrate=10

dtrace:::BEGIN
{
        printf("%3s  %-26s %-14s %6s %10s  %6s\n", "CPU", "REMOTE IP",
            "EVENT", "BYTES", "ITT", "SCSIOP");

        /*
         * SCSI opcode to string translation hash. This is from
         * /usr/include/sys/scsi/generic/commands.h. If you would
         * rather all hex, comment this out.
         */
        scsiop[0x08] = "read";
        scsiop[0x0a] = "write";
        scsiop[0x0b] = "seek";
        scsiop[0x28] = "read(10)";
        scsiop[0x2a] = "write(10)";
        scsiop[0x2b] = "seek(10)";
}

iscsi*:::data-*,
iscsi*:::login-*,
iscsi*:::logout-*,
iscsi*:::nop-*,
iscsi*:::task-*,
iscsi*:::async-*,
iscsi*:::scsi-response
{
        printf("%3d  %-26s %-14s %6d %10d  -\n", cpu, args[0]->ci_remote,
            probename, args[1]->ii_datalen, args[1]->ii_itt);
}

iscsi*:::scsi-command
/scsiop[args[2]->ic_cdb[0]] != NULL/
{
        printf("%3d  %-26s %-14s %6d %10d  %s\n", cpu, args[0]->ci_remote,
            probename, args[1]->ii_datalen, args[1]->ii_itt,
            scsiop[args[2]->ic_cdb[0]]);
}

iscsi*:::scsi-command
/scsiop[args[2]->ic_cdb[0]] == NULL/
{
        printf("%3d  %-26s %-14s %6d %10d  0x%x\n", cpu, args[0]->ci_remote,
            probename, args[1]->ii_datalen, args[1]->ii_itt,
            args[2]->ic_cdb[0]);
}

Example output,

# ./iscsisnoop.d 
CPU  REMOTE IP                  EVENT           BYTES        ITT  SCSIOP
  0  192.168.1.109              scsi-command        0     229452  read(10)
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              data-send        8192     229452  -
  0  192.168.1.109              scsi-command      512     229453  write(10)
  0  192.168.1.109              scsi-response       0     229453  -
  0  192.168.1.109              scsi-command      512     229454  write(10)
  0  192.168.1.109              scsi-command      512     229455  write(10)
  0  192.168.1.109              scsi-response       0     229454  -
  0  192.168.1.109              scsi-response       0     229455  -
  0  192.168.1.109              scsi-command     8192     229456  write(10)
  0  192.168.1.109              scsi-command     8192     229457  write(10)
  0  192.168.1.109              scsi-command     3072     229458  write(10)
  0  192.168.1.109              scsi-command     3072     229459  write(10)
  0  192.168.1.109              data-request        0     229456  -
  0  192.168.1.109              data-request        0     229457  -
  0  192.168.1.109              data-receive     8192     229456  -
  0  192.168.1.109              data-receive     8192     229457  -
[...]

The above output begins with a client read of a 128 Kbyte file, which can be seen as,

  1. An iSCSI event of type scsi-command fires, which is a read(10) (a read variant).
  2. Numerous data-send iSCSI events fire, as the iSCSI target server transfers data to the client.
    there are 16 of these with a size of 8 Kbytes each, summing to 128 Kbytes - the file size that was being transferred.

A CPU column is printed as a reminder that this output may be shuffled on a multi-CPU server (due to the nature of DTrace consumer CPU buffer reading). If this becomes a problem, edit the script to print a timestamp field and post sort.

[edit] iscsiio.d

This script generates a report iSCSI data I/O statistics by client.

#!/usr/sbin/dtrace -s
/*
 * iscsiio.d - Report iSCSI I/O. Solaris Nevada, DTrace.
 *
 * This traces requested iSCSI data I/O events when run on an iSCSI server.
 * The output reports iSCSI read and write I/O while this script was running.
 *
 * USAGE: iscsiio.d             # Hit Ctrl-C to end
 *
 * FIELDS:
 *              REMOTE IP       IP address of the client
 *              EVENT           Data I/O event (read/write)
 *              COUNT           Number of I/O events
 *              Kbytes          Total data Kbytes transferred
 *              KB/sec          Average data Kbytes/sec transferred
 */

#pragma ident   "@(#)iscsiio.d  1.2     07/03/27 SMI"

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
        start = timestamp;
}

iscsi*:::data-send
{
        @num[args[0]->ci_remote, "read"] = count();
        @bytes[args[0]->ci_remote, "read"] = sum(args[1]->ii_datalen);
        @rate[args[0]->ci_remote, "read"] = sum(args[1]->ii_datalen);
}

iscsi*:::data-receive
{
        @num[args[0]->ci_remote, "write"] = count();
        @bytes[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
        @rate[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
}

iscsi*:::scsi-command
/args[2]->ic_cdb[0] == 0x0a || args[2]->ic_cdb[0] == 0x2a/
{
        /*
         * scsi-command writes also move data. Their codes are in
         * /usr/include/sys/scsi/generic/commands.h .
         */
        @num[args[0]->ci_remote, "write"] = count();
        @bytes[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
        @rate[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
}

dtrace:::END
{
        normalize(@rate, (timestamp - start) * 1024 / 1000000000);
        printf("   %-26s %8s %8s %10s %10s\n", "REMOTE IP", "EVENT", "COUNT",
            "Kbytes", "KB/sec");
        normalize(@bytes, 1024);
        printa("   %-26s %8s %@8d %@10d %@10d\n", @num, @bytes, @rate);
}

Example output,

# ./iscsiio.d 
Tracing... Hit Ctrl-C to end.
^C
   REMOTE IP                     EVENT    COUNT     Kbytes     KB/sec
   192.168.1.109                 write     1323      10386        464
   192.168.1.109                  read     3848      30784       1376

The example outpt shows a single IP address performing reads and writes to our iSCSI target server.

[edit] iscsirwlat.d

This script traces the latency of data reads and writes performed by iSCSI clients.

#!/usr/sbin/dtrace -s
/*
 * iscsirwlat.d - Report iSCSI Read/Write Latency. Solaris Nevada, DTrace.
 *
 * This traces iSCSI data I/O events when run on an iSCSI server, and
 * produces a report of read/write latency in microseconds.
 *
 * USAGE: iscsirwlat.d          # Hit Ctrl-C to end
 *
 * FIELDS:
 *              EVENT           Data I/O event (data-send/data-receive)
 *              REMOTE IP       IP address of the client
 *              COUNT           Number of I/O events
 *              Kbytes          Total data Kbytes transferred
 *
 * NOTE: If you try to sum the read times, they will sometimes add to
 * a time longer than the sample time - the reason is that reads can
 * occur in parallel, and so suming them together will overcount.
 */

#pragma ident   "@(#)iscsirwlat.d       1.3     07/03/28 SMI"

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

iscsi*:::scsi-command
/args[2]->ic_cdb[0] == 0x08 || args[2]->ic_cdb[0] == 0x28/
{
        /*
         * self-> variables can't be used, as one thread receives the
         * scsi command while another receives the reads.
         */
        start_read[args[1]->ii_itt, args[1]->ii_initiator] = timestamp;
}

iscsi*:::scsi-command
/args[2]->ic_cdb[0] == 0x0a || args[2]->ic_cdb[0] == 0x2a/
{
        start_write[args[1]->ii_itt, args[1]->ii_initiator] = timestamp;
}

iscsi*:::data-send
/start_read[args[1]->ii_itt, args[1]->ii_initiator] &&
    (args[1]->ii_flags & ISCSI_FLAG_FINAL)/
{
        @read[args[0]->ci_remote] = quantize(timestamp -
            start_read[args[1]->ii_itt, args[1]->ii_initiator]);
        start_read[args[1]->ii_ttt, args[1]->ii_initiator] = 0;
}

iscsi*:::scsi-response
/start_write[args[1]->ii_itt, args[1]->ii_initiator]/
{
        @write[args[0]->ci_remote] = quantize(timestamp -
            start_write[args[1]->ii_itt, args[1]->ii_initiator]);
        start_write[args[1]->ii_itt, args[1]->ii_initiator] = 0;
}

dtrace:::END
{
        printf("Read Latency (ns),\n");
        printa(@read);
        printf("Write Latency (ns),\n");
        printa(@write);
}

Example output,

# ./iscsirwlat.d 
Tracing... Hit Ctrl-C to end.
^C
Read Latency (ns),

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
           32768 |                                         0        
           65536 |@@@                                      27       
          131072 |@@                                       17       
          262144 |                                         4        
          524288 |@@@@                                     39       
         1048576 |@@@                                      32       
         2097152 |@@@@@@@@@@@@@@@@@                        164      
         4194304 |@                                        11       
         8388608 |@                                        5        
        16777216 |@                                        10       
        33554432 |@                                        7        
        67108864 |@                                        5        
       134217728 |@@@@@                                    46       
       268435456 |@                                        14       
       536870912 |                                         0        

Write Latency (ns),

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
           32768 |                                         0        
           65536 |@@                                       14       
          131072 |@@                                       16       
          262144 |@@@                                      20       
          524288 |@@@                                      20       
         1048576 |@@@@@                                    33       
         2097152 |@                                        5        
         4194304 |                                         0        
         8388608 |                                         2        
        16777216 |@                                        10       
        33554432 |@@@                                      18       
        67108864 |@@@@@                                    33       
       134217728 |@@@@@@@@@@@@@                            93       
       268435456 |@@                                       12       
       536870912 |                                         0        

The output includes a breakdown by client IP address. What is interesting is that for both reads and writes, there appears to be two peaks in the distribution plot - perhaps caused by two different types of data I/O. More DTracing will be required to explain this, possibly on the iSCSI client as well as the target server.

Also worth noting is that I/O often performs transactions in parallel, so that if you sum all the latency times you can end up with much more than 1000 ms of latency per second.

[edit] Client

Tracing iSCSI events on the iSCSI client can be performed using both the io and the fbt providers.

Solaris Internals
Personal tools
The Books
The Ads