DTrace Topics iSCSI
From Siwiki
Contents |
[edit] DTrace Topics: iSCSI
This article is about DTracing iSCSI, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.
DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems. iSCSI is a network protocol for accessing remote disk devices.
Completion
Difficulty
Audience Sysadmins, storage admins
[edit] iSCSI and DTrace
An iSCSI system involves an iSCSI target server which hosts the disks, and iSCSI clients which connect to the target server. Tracing iSCSI on the target server and the client is performed in different ways.
DTrace providers for iSCSI tracing:
| Target | DTrace Provider | Description |
| iSCSI target server | iscsi | trace iSCSI events including data I/O |
| iSCSI target server | pid | trace raw activity from iscsitgtd |
| iSCSI client | io | trace data I/O events |
| iSCSI client | fbt | trace raw iSCSI kernel module activity |
[edit] Target Server
An iSCSI target provider has been developed by Adam Leventhal at Sun. Adam posted a summary of the provider to dtrace-discuss, which this summary section is based on.
This provider as not yet available as of 27-Mar-2007. When it is publically available, this sentence will be replaced by instructions on how it can be fetched and installed.
The iscsi provider is a USDT provider built into the iSCSI target daemon (iscsitgtd). As a USDT provider the name is concatenated with the process ID of the daemon (e.g. iscis123).
Since this provider instruments the iSCSI target daemon, DTrace commands and scripts must be run on the iSCSI target server to observe these probes.
[edit] Probes
| iSCSI Event | Probes |
| SCSI command/response | iscsi*:::scsi-command iscsi*:::scsi-response |
| Data out/in/request (rtt) | iscsi*:::data-send iscsi*:::data-receive |
| Login and logout command/response | iscsi*:::login-command iscsi*:::login-response |
| NOP out/in (pings) | iscsi*:::nop-receive iscsi*:::nop-send |
| Text and task command/response | iscsi*:::task-command iscsi*:::task-response |
| Asynchronous message from target | iscsi*:::async-send |
[edit] Arguments
| Probes | Variable | Type | DescriptionTarget Server |
| * | args[0] | conninfo_t * | connection info |
| * | args[1] | iscsiinfo_t * | common iSCSI properties |
| scsi-command | args[2] | iscsicmd_t * | SCSI opcode |
[edit] Types
conninfo_t
typedef struct conninfo {
string ci_local; /* local host IP address */
string ci_remote; /* remote host IP address */
string ci_protocol; /* protocol ("ipv4", "ipv6") */
} conninfo_t;
iscsiinfo_t
typedef struct iscsiinfo {
string ii_target; /* target iqn */
string ii_initiator; /* initiator iqn */
uint64_t ii_lun; /* target logical unit number */
uint32_t ii_itt; /* initiator task tag */
uint32_t ii_ttt; /* target transfer tag */
uint32_t ii_cmdsn; /* command sequence number */
uint32_t ii_statsn; /* status sequence number */
uint32_t ii_datasn; /* data sequence number */
uint32_t ii_datalen; /* length of data payload */
uint32_t ii_flags; /* probe-specific flags */
} iscsiinfo_t;
iscsicmd_t
typedef struct iscsicmd {
uint64_t ic_len; /* CDB length */
uint8_t *ic_cdb; /* CDB data */
} iscsicmd_t;
If the initiator or target IQNs are invalid (e.g. in login-command), they will be the empy string. If tag fields (ii_itt and ii_ttt) are invalid, they are set to 0xffffffff (the tag reserved by the iSCSI protocol). If the sequence numbers are invalid, they are set to 0.
[edit] Strategies
The following are suggested strategies for analysing iSCSI traffic:
- Counting the frequency of iSCSI command types
- check for ratios that don't match expected workload (eg, iscsiwho.d).
- Counting the frequency of iSCSI client IP addresses
- check who is accessing the target (eg, iscsiwho.d).
- Watching iSCSI events live
- for event by event analysis (eg, iscsisnoop.d).
- Measuring latencies for iSCSI data transfers
- latencies are crucial for performance analysis (eg, iscsirwlat.d).
[edit] One-Liners
The following one-liners are easy to use and may prove quite useful. For further analysis, see the example scripts.
- Frequency of iSCSI command types,
dtrace -n 'iscsi*::: { @[probename] = count(); }'
- Frequency of iSCSI client IP addresses,
dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'
- Payload bytes by iSCSI command type,
dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'
- Payload byte distribution by iSCSI command type,
dtrace -n 'iscsi*::: { @[probename] = quantize(args[1]->ii_datalen); }'
[edit] Examples
[edit] iscsiwho.d
This is a simple script to produce a report of the remote IP addresses and a count of iSCSI events. This is intended to provide a quick summary of iSCSI activity when run on the iSCSI target server.
#!/usr/sbin/dtrace -s
/*
* iscsiwho.d - Report iSCSI client events. Solaris Nevada, DTrace.
*
* This traces requested iSCSI events when run on an iSCSI server.
*
* USAGE: iscsiwho.d # Hit Ctrl-C to end
*
* FIELDS:
* REMOTE IP IP address of the client
* iSCSI EVENT iSCSI event type.
* COUNT Number of events traced
*/
#pragma ident "@(#)iscsiwho.d 1.3 07/03/27 SMI"
#pragma D option quiet
dtrace:::BEGIN
{
printf("Tracing... Hit Ctrl-C to end.\n");
}
iscsi*:::
{
@events[args[0]->ci_remote, probename] = count();
}
dtrace:::END
{
printf(" %-26s %14s %8s\n", "REMOTE IP", "iSCSI EVENT", "COUNT");
printa(" %-26s %14s %@8d\n", @events);
}
Example output,
# ./iscsiwho.d Tracing... Hit Ctrl-C to end. ^C REMOTE IP iSCSI EVENT COUNT 192.168.1.109 nop-receive 1 192.168.1.109 nop-send 1 192.168.1.109 scsi-response 126 192.168.1.109 scsi-command 208 192.168.1.109 data-receive 1207 192.168.1.109 data-request 1207 192.168.1.109 data-send 1290
The above example shows normal traffic. For this simple script, these event names are not translated beyond their iSCSI provider probe names, and require some thought to comprehend (they are from the perspective of the iSCSI target server).
An interpretation for some of these events are:
| iSCSI event | Interpretation |
| scsi-command | A SCSI command was issued, such as a read or a write. Use other scripts for a breakdown on the SCSI command type. |
| data-send | Data was sent from the iSCSI target server to the client; the client is performing a read. |
| data-receive | Data was received on the iSCSI target server from the client. The client is performing a write. |
The iscsisnoop.d script does do some interepretation of event names.
[edit] iscsisnoop.d
This is a simple script to snoop iSCSI events from the target server, live. Some of the iSCSI event names are translated.
#!/usr/sbin/dtrace -s
/*
* iscsisnoop.d - Snoop iSCSI events. Solaris Nevada, DTrace.
*
* This snoops iSCSI events when run on an iSCSI server.
*
* USAGE: iscsisnoop.d # Hit Ctrl-C to end
*
* FIELDS:
* CPU CPU event occured on
* REMOTE IP IP address of the client
* EVENT Data I/O event (data-send/data-receive)
* BYTES Data bytes
* ITT Initiator task tag
* SCSIOP SCSI opcode as a description, as hex, or '-'
*
* NOTE: On multi-CPU servers output may not be in correct time order
* (shuffled). A change in the CPU column is a hint that this happened.
* If this is a problem, print an extra timestamp field and post sort.
*/
#pragma ident "@(#)iscsisnoop.d 1.2 07/03/27 SMI"
#pragma D option quiet
#pragma D option switchrate=10
dtrace:::BEGIN
{
printf("%3s %-26s %-14s %6s %10s %6s\n", "CPU", "REMOTE IP",
"EVENT", "BYTES", "ITT", "SCSIOP");
/*
* SCSI opcode to string translation hash. This is from
* /usr/include/sys/scsi/generic/commands.h. If you would
* rather all hex, comment this out.
*/
scsiop[0x08] = "read";
scsiop[0x0a] = "write";
scsiop[0x0b] = "seek";
scsiop[0x28] = "read(10)";
scsiop[0x2a] = "write(10)";
scsiop[0x2b] = "seek(10)";
}
iscsi*:::data-*,
iscsi*:::login-*,
iscsi*:::logout-*,
iscsi*:::nop-*,
iscsi*:::task-*,
iscsi*:::async-*,
iscsi*:::scsi-response
{
printf("%3d %-26s %-14s %6d %10d -\n", cpu, args[0]->ci_remote,
probename, args[1]->ii_datalen, args[1]->ii_itt);
}
iscsi*:::scsi-command
/scsiop[args[2]->ic_cdb[0]] != NULL/
{
printf("%3d %-26s %-14s %6d %10d %s\n", cpu, args[0]->ci_remote,
probename, args[1]->ii_datalen, args[1]->ii_itt,
scsiop[args[2]->ic_cdb[0]]);
}
iscsi*:::scsi-command
/scsiop[args[2]->ic_cdb[0]] == NULL/
{
printf("%3d %-26s %-14s %6d %10d 0x%x\n", cpu, args[0]->ci_remote,
probename, args[1]->ii_datalen, args[1]->ii_itt,
args[2]->ic_cdb[0]);
}
Example output,
# ./iscsisnoop.d CPU REMOTE IP EVENT BYTES ITT SCSIOP 0 192.168.1.109 scsi-command 0 229452 read(10) 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 data-send 8192 229452 - 0 192.168.1.109 scsi-command 512 229453 write(10) 0 192.168.1.109 scsi-response 0 229453 - 0 192.168.1.109 scsi-command 512 229454 write(10) 0 192.168.1.109 scsi-command 512 229455 write(10) 0 192.168.1.109 scsi-response 0 229454 - 0 192.168.1.109 scsi-response 0 229455 - 0 192.168.1.109 scsi-command 8192 229456 write(10) 0 192.168.1.109 scsi-command 8192 229457 write(10) 0 192.168.1.109 scsi-command 3072 229458 write(10) 0 192.168.1.109 scsi-command 3072 229459 write(10) 0 192.168.1.109 data-request 0 229456 - 0 192.168.1.109 data-request 0 229457 - 0 192.168.1.109 data-receive 8192 229456 - 0 192.168.1.109 data-receive 8192 229457 - [...]
The above output begins with a client read of a 128 Kbyte file, which can be seen as,
- An iSCSI event of type scsi-command fires, which is a read(10) (a read variant).
- Numerous data-send iSCSI events fire, as the iSCSI target server transfers data to the client.
- there are 16 of these with a size of 8 Kbytes each, summing to 128 Kbytes - the file size that was being transferred.
A CPU column is printed as a reminder that this output may be shuffled on a multi-CPU server (due to the nature of DTrace consumer CPU buffer reading). If this becomes a problem, edit the script to print a timestamp field and post sort.
[edit] iscsiio.d
This script generates a report iSCSI data I/O statistics by client.
#!/usr/sbin/dtrace -s
/*
* iscsiio.d - Report iSCSI I/O. Solaris Nevada, DTrace.
*
* This traces requested iSCSI data I/O events when run on an iSCSI server.
* The output reports iSCSI read and write I/O while this script was running.
*
* USAGE: iscsiio.d # Hit Ctrl-C to end
*
* FIELDS:
* REMOTE IP IP address of the client
* EVENT Data I/O event (read/write)
* COUNT Number of I/O events
* Kbytes Total data Kbytes transferred
* KB/sec Average data Kbytes/sec transferred
*/
#pragma ident "@(#)iscsiio.d 1.2 07/03/27 SMI"
#pragma D option quiet
dtrace:::BEGIN
{
printf("Tracing... Hit Ctrl-C to end.\n");
start = timestamp;
}
iscsi*:::data-send
{
@num[args[0]->ci_remote, "read"] = count();
@bytes[args[0]->ci_remote, "read"] = sum(args[1]->ii_datalen);
@rate[args[0]->ci_remote, "read"] = sum(args[1]->ii_datalen);
}
iscsi*:::data-receive
{
@num[args[0]->ci_remote, "write"] = count();
@bytes[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
@rate[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
}
iscsi*:::scsi-command
/args[2]->ic_cdb[0] == 0x0a || args[2]->ic_cdb[0] == 0x2a/
{
/*
* scsi-command writes also move data. Their codes are in
* /usr/include/sys/scsi/generic/commands.h .
*/
@num[args[0]->ci_remote, "write"] = count();
@bytes[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
@rate[args[0]->ci_remote, "write"] = sum(args[1]->ii_datalen);
}
dtrace:::END
{
normalize(@rate, (timestamp - start) * 1024 / 1000000000);
printf(" %-26s %8s %8s %10s %10s\n", "REMOTE IP", "EVENT", "COUNT",
"Kbytes", "KB/sec");
normalize(@bytes, 1024);
printa(" %-26s %8s %@8d %@10d %@10d\n", @num, @bytes, @rate);
}
Example output,
# ./iscsiio.d Tracing... Hit Ctrl-C to end. ^C REMOTE IP EVENT COUNT Kbytes KB/sec 192.168.1.109 write 1323 10386 464 192.168.1.109 read 3848 30784 1376
The example outpt shows a single IP address performing reads and writes to our iSCSI target server.
[edit] iscsirwlat.d
This script traces the latency of data reads and writes performed by iSCSI clients.
#!/usr/sbin/dtrace -s
/*
* iscsirwlat.d - Report iSCSI Read/Write Latency. Solaris Nevada, DTrace.
*
* This traces iSCSI data I/O events when run on an iSCSI server, and
* produces a report of read/write latency in microseconds.
*
* USAGE: iscsirwlat.d # Hit Ctrl-C to end
*
* FIELDS:
* EVENT Data I/O event (data-send/data-receive)
* REMOTE IP IP address of the client
* COUNT Number of I/O events
* Kbytes Total data Kbytes transferred
*
* NOTE: If you try to sum the read times, they will sometimes add to
* a time longer than the sample time - the reason is that reads can
* occur in parallel, and so suming them together will overcount.
*/
#pragma ident "@(#)iscsirwlat.d 1.3 07/03/28 SMI"
#pragma D option quiet
dtrace:::BEGIN
{
printf("Tracing... Hit Ctrl-C to end.\n");
}
iscsi*:::scsi-command
/args[2]->ic_cdb[0] == 0x08 || args[2]->ic_cdb[0] == 0x28/
{
/*
* self-> variables can't be used, as one thread receives the
* scsi command while another receives the reads.
*/
start_read[args[1]->ii_itt, args[1]->ii_initiator] = timestamp;
}
iscsi*:::scsi-command
/args[2]->ic_cdb[0] == 0x0a || args[2]->ic_cdb[0] == 0x2a/
{
start_write[args[1]->ii_itt, args[1]->ii_initiator] = timestamp;
}
iscsi*:::data-send
/start_read[args[1]->ii_itt, args[1]->ii_initiator] &&
(args[1]->ii_flags & ISCSI_FLAG_FINAL)/
{
@read[args[0]->ci_remote] = quantize(timestamp -
start_read[args[1]->ii_itt, args[1]->ii_initiator]);
start_read[args[1]->ii_ttt, args[1]->ii_initiator] = 0;
}
iscsi*:::scsi-response
/start_write[args[1]->ii_itt, args[1]->ii_initiator]/
{
@write[args[0]->ci_remote] = quantize(timestamp -
start_write[args[1]->ii_itt, args[1]->ii_initiator]);
start_write[args[1]->ii_itt, args[1]->ii_initiator] = 0;
}
dtrace:::END
{
printf("Read Latency (ns),\n");
printa(@read);
printf("Write Latency (ns),\n");
printa(@write);
}
Example output,
# ./iscsirwlat.d
Tracing... Hit Ctrl-C to end.
^C
Read Latency (ns),
192.168.1.109
value ------------- Distribution ------------- count
32768 | 0
65536 |@@@ 27
131072 |@@ 17
262144 | 4
524288 |@@@@ 39
1048576 |@@@ 32
2097152 |@@@@@@@@@@@@@@@@@ 164
4194304 |@ 11
8388608 |@ 5
16777216 |@ 10
33554432 |@ 7
67108864 |@ 5
134217728 |@@@@@ 46
268435456 |@ 14
536870912 | 0
Write Latency (ns),
192.168.1.109
value ------------- Distribution ------------- count
32768 | 0
65536 |@@ 14
131072 |@@ 16
262144 |@@@ 20
524288 |@@@ 20
1048576 |@@@@@ 33
2097152 |@ 5
4194304 | 0
8388608 | 2
16777216 |@ 10
33554432 |@@@ 18
67108864 |@@@@@ 33
134217728 |@@@@@@@@@@@@@ 93
268435456 |@@ 12
536870912 | 0
The output includes a breakdown by client IP address. What is interesting is that for both reads and writes, there appears to be two peaks in the distribution plot - perhaps caused by two different types of data I/O. More DTracing will be required to explain this, possibly on the iSCSI client as well as the target server.
Also worth noting is that I/O often performs transactions in parallel, so that if you sum all the latency times you can end up with much more than 1000 ms of latency per second.
[edit] Client
Tracing iSCSI events on the iSCSI client can be performed using both the io and the fbt providers.
