Networks

From Siwiki

Jump to: navigation, search

Contents

[edit] Tuning Network Performance

[edit] Tuning TCP for 10GbE throughput

ndd -set /dev/tcp tcp_recv_hiwat 400000                                
ndd -set /dev/tcp tcp_xmit_hiwat 400000
ndd -set /dev/tcp tcp_max_buf 2097152
ndd -set /dev/tcp tcp_cwnd_max 2097152

[edit] Tuning XAUI, Sun Multi-threaded 10GbE on T5140/T5240

[edit] Tunable for general workloads on T5140/T5240

/etc/system

set ip:ip_soft_rings_cnt=16

[edit] Additional tunable for transmit throughput

Software LSO helps to maximize TCP transmit throughput and/or reduce CPU utilization. Software LSO is available in Nevada 82 and as S10U5 patch (Patch-ID#: T138048-01). To enable software LSO, edit /platform/sun4v/kernel/drv/nxge.conf and uncomment the line

soft-lso-enable = 1;

[edit] Tuning NIU on T5120/T5220

[edit] Tunable for general workloads on T5120/T5220

/etc/system

set ip:ip_soft_rings_cnt=16

[edit] Tuning Sun Multi-threaded 10GbE on T5120/T5220

[edit] Tunable for general workloads on T5120/T5220

/etc/system

set ip:ip_soft_rings_cnt=16

[edit] Tuning Sun Multi-threaded 10GbE on OPL

[edit] Tunable for general workloads on OPL

/etc/system

set ddi_msix_alloc_limit=8
set ip:ip_soft_rings_cnt=16
set ip_squeue_soft_ring=1
set ip_threads_per_cpu=2

[edit] Tuning Sun Multi-threaded 10GbE on T1000/T2000

[edit] Tunable for general workloads on T1000/T2000

/etc/system

set ddi_msix_alloc_limit=8
set ip:ip_soft_rings_cnt=16

[edit] Additional tunable for systems with 1.0GHz CPU

psradm -i 1-3 5-7 9-11 13-15 17-19 21-23 25-27 29-31

This spreads 8 NIC interrupt to 8 cores.


[edit] Tuning Sun Multi-threaded 10GbE on x86

[edit] Tunable for general workloads on >=4 core x86

S10 Update 5 supports multiple MSI-X on x86. /etc/system

set ddi_msix_alloc_limit=8
set pcplusmp:apic_multi_msi_max=8
set pcplusmp:apic_msix_max=8
set pcplusmp:apic_intr_policy=1
set nxge:nxge_msi_enable=2

[edit] Tunable for general workloads on x86

/etc/system

set ip_squeue_soft_ring=1
set ip:ip_soft_rings_cnt=n (n=min(8, number of cores))

This is for S10 Update 4 and Solaris Nevada 60 or later. For S10 Update 5 or later, multiple MSI-X tuning (previous section) is recommended instead.

For S10 Update 3, you need either a patch(Patch-ID#: T123776-03) for CR6474602[1] , or additional tunables below to avoid missing interrupts and causing NIC to hang when using MSI:

set pcplusmp:apic_enable_dynamic_migration=0
set pcplusmp:apic_intr_policy=1

[edit] Additional tunable for transmit throughput

Software LSO helps to maximize TCP transmit throughput and/or reduce CPU utilization. Software LSO is available in Nevada 82 and as S10U5 patch (Patch-ID#: T138049). To enable software LSO, edit nxge.conf (under /platform/i86pc/kernel/drv on S10, /kernel/drv on Nevada) and uncomment the line

soft-lso-enable = 1;

[edit] Additional tunables for Sun Fire X4150, X4450

Disable 'Hardware Prefetcher', 'Adjacent Cache Line Prefetch', enable 'Crystal Beach/DMA' in BIOS. /etc/system

set nxge:nxge_bcopy_thresh=1024

may help, depending on CPU frequency.

[edit] Additional tunable for Sun Fire X2200, X4100/X4200, X4140, X4240, X4440

/etc/system

set nxge:nxge_bcopy_thresh=1024

Whether this will reduce cpu utilization or improve throughput also depends CPU clock frequency.

[edit] Tuning Sun Quad GbE x8 PCI Express on T5120/T5220/T2000/T1000

[edit] Tunable for general workloads

None.

[edit] Tunable for small packet workloads

/etc/system

set ddi_msix_alloc_limit=4
set ip:ip_soft_rings_cnt=16

[edit] Tuning Sun Quad GbE x8 PCI Express on OPL

[edit] Tunable for general workloads

None.

[edit] Tunable for small packet workloads

/etc/system

set ddi_msix_alloc_limit=4
set ip_squeue_soft_ring=1
set ip:ip_soft_rings_cnt=16

For interrupt fencing:

psradm -i 1 3 5 7 9 ... (#cpu - 1)

[edit] Tuning Sun Quad GbE x8 PCI Express on x86

For S10 Update 3 on x86, you need either a patch(Patch-ID#: T123776-03) for CR6474602[2] , or additional tunables below to avoid missing interrupts and causing NIC to hang when using MSI:

set pcplusmp:apic_enable_dynamic_migration=0
set pcplusmp:apic_intr_policy=1

[edit] Tunable for small packet workloads

/etc/system

set ip_squeue_soft_ring=1
set ip:ip_soft_rings_cnt=8

[edit] Increasing number of transmit descriptors

If NIC runs out of transmit descriptors, throughput and packet rate is not optimal. You can increase number of transmit descriptors per DMA channel to up to 8192.

set nxge:nxge_tx_ring_size=8192

You can check whether nxge1 runs out of transmit descriptors by

kstat nxge:1 |grep tdc_tx_no_desc

Transmit descriptors consume kernel memory and should be increased just enough for performance.

[edit] Additional tunable for single connection TCP throughput

To optimize throughput on few (<= # CPU / 2) TCP connections with heavy traffic: /etc/system

set ip:tcp_squeue_wput=1

This has been found to help throughput on x86, and jumbo frame throughput on CMT for this type of workload.

[edit] Additional tunable for bursty TCP connection establishment

Bursty TCP connection establishment will lead to unbalanced connection -> CPU mapping (CR 6364567 [3]). For example, TCP throughput with multiple connections measured by iperf may be limited. The work-around is /etc/system

set hires_tick=1

Using this tunable may increase CPU utilization.

[edit] Tuning for link aggregation

Because soft ring count is for the aggregated link, not individual interface, more soft rings are recommended. As a starting point, use (# of recommended soft rings for 1 interface) * (# of aggregated interface) soft rings. e.g.

set ip:ip_soft_rings_cnt=8

for 4 aggregated e1000g, since 2 soft rings are recommended for e1000g, 2 * 4 = 8.

If mpstat shows interrupt CPU is almost 100% utilized, distribute NIC interrupt to all cores. Use 8 core T2000 as an example:

psradm -i 1-3 5-7 9-11 13-15 17-19 21-23 25-27 29-31

[edit] Tuning in /etc/system vs. ndd

Some tunables can be changed in either /etc/system or ndd. e.g. number of soft rings. Changing these tunables in /etc/system take effect after the system reboots, and persists across reboots. Changing them using ndd take effect immediately, but doesn't persist across reboots.

For example, changing number of soft rings using ndd effect NIC plumbed afterwards, but NIC already plumbed are not effected.

[edit] Explanation for tunables

  • ddi_msix_alloc_limit: This is a system-wide setting of the maximum number of MSI (Message Signaled Interrupt) and MSI-X that can be allocated per PCI device. The default is to allocate maximum 2 MSI per device. Each receive DMA channel of a NIC can generate one interrupt, and each interrupt will target one CPU. Sun Multi-threaded 10GbE has 8 receive DMA channels per port, and Quad GbE has 4, so their interrupts can target at most 8 and 4 different CPU, respectively. To avoid interrupt CPU becoming the performance bottleneck, it is recommended to start with a value of the number of receive DMA channels per port or (# of CPU), whichever is lower, so that interrupt loads are distributed to enough CPU.
  • ip_soft_rings_cnt: This is a system-wide setting of how many software rings (aka soft rings) to use to process received packets. The default is 2 on Niagara systems. For optimal receive throughput, it is recommended to start with 8 to 16 software rings on CMT, and 16 or 32 on OPL. The optimal number of software rings depends on network device and workload. You can specify different number of software rings per network device.
  • tcp_squeue_wput: When this is set to 1 (default is 2), the application tries to process its own packets but don't try to drain the squeue. The result is more TCP packets will be processed by soft ring thread and more balanced utilization on 2 CPU for one connection. CPU efficiency may be slightly lower.
  • For systems with 1.0GHz CPU under heavy network traffic, the interrupt CPU may become the bottleneck when NIC interrupts fall on only 2 or 3 cores. The psradm command above enables only 1 strand per core to take interrupt, thus NIC interrupts are distributed to all cores.
  • apic_intr_policy: 1 is round robin interrupt distribution.
  • apic_enable_dynamic_migration: 0 disables interrupt migration between cpu.
  • nxge_msi_enable: 2 is MSI-X. There are more MSI-X vectors available than MSI, so MSI-X is preferred.
Solaris Internals
Personal tools
The Books