Chelsio N210/N110 10Gb Ethernet Network Controller Driver Installation Notes for Linux Version 2.1.4 May 03, 2006 Copyright (c) 2003-2005 Chelsio Communications All rights reserved. CONTENTS ======== INTRODUCTION PACKAGING PREREQUISITES DEFINITIONS INSTALLATION Hardware Installation Software Installation Installing the source RPM package Installing the driver as a kernel module Patching the driver into the Linux kernel Network configuration FEATURES DRIVER MESSAGES LIMITATIONS KNOWN ISSUES PERFORMANCE SUPPORT INTRODUCTION ============ This document describes the Linux driver for Chelsio 10Gb Ethernet Network Controller. This driver supports the Chelsio N210 NIC and is backward compatible with the Chelsio N110 model NICs. This driver supports Linux kernels 2.4.21 and later, including version 2.6.x and RHEL kernels (2.4.21-x). This driver supports AMD64 and EM64T, and x86 systems. PACKAGING ========= This driver is released as source in compressed tar format. The filenames for the package is cxgb-2.1.4.tar.gz. PREREQUISITES ============= In order to compile the driver as a module on your machine, or to use the source RPM installation, you must have a properly compiled kernel source tree which matches the kernel you plan to run with the driver. It is suggested that you are currently running the kernel that you wish to compile with the driver. This will help to identify that you have a working kernel and simplifies driver installation. In order to patch this driver into a Linux kernel, you must have a kernel source tree. Chelsio currently provides patched for Linux 2.4.25, 2.4.26, 2.6.5-7.139, 2.6.6, 2.6.9-11.EL, and 2.6.12 kernels. You must have a working C/C++ compiler, libraries, and any other dependencies installed prior to compiling the driver. The following tools may be required for tuning or for controlling the features provided with this driver: ethtool v1.8+ Used to display or change ethernet card settings. Version 1.8 or greater is required for TSO support in Linux. If your system does not have this tool installed, you may get the source from http://sourceforge.net/projects/gkernel/ sysctl Used to configure kernel parameters at runtime. Additional usage information can be found in the man page and IP specific information can be found in the Linux source, Documentation/networking/ip-sysctl.txt. DEFINITIONS =========== The following terms and keys are used throughout this document: is the top-level path name of the Linux source tree. is the FULL version name of the kernel, ex. 2.6.9-11.ELsmp is the architecture of your CPU, ex. i386 or x86_64 is a network interface name, ex. eth0 INSTALLATION ============ Hardware Installation: ---------------------- 1. Insert the Chelsio 10Gb NIC in an available PCI-X slot running at 133Mhz. The NIC will operate in a 100Mhz or 66Mhz slot, but performance will be reduced due to the system PCI bandwidth. Note, the NIC is designed only to work in a 3.3V PCI or PCI-X slot. If you have multiple PCI-X 133Mhz slots, it is important to note that the PCI-X bandwidth will be reduced to 100Mhz if more than one PCI-X device is installed on the same 133Mhz bus. For best performance, only install one device on the PCI-X 133Mhz bus. 2. After installing the hardware, you may want to check that Linux has recognized the hardware. Power on the system and boot into your Linux OS. Check the PCI bus for the new hardware by typing: lspci OR cat /proc/pci You should see your new Chelsio 10Gb NIC as an Ethernet Controller with a vendor name of "Chelsio Communications Inc" or "ASIC Designers Inc" with a vendor ID 1425. If the NIC is not being recognized, try a different PCI or PCI-X slot. Be sure the card is fully seated and the mounting screw is tight. If the card is still not being recognized, please contact customer support for help. Software Installation: ---------------------- You will need a compiled Linux source tree for the kernel you wish to run with the driver. If you do not have the source tree, you can download one from ftp://ftp.kernel.org or obtain the source RPM from your distribution vendor. The driver should work on all versions of Linux 2.4.22 and later, including version 2.6.x kernels and RHEL kernels (2.4.21-x). In order to install the driver, you will need to have root privileges. 1. Uncompress the tar file: gzip -d < cxgb-2.1.4.tar.gz | tar xf - 2. Build the driver against a compiled Linux kernel: cd src make make install The driver will be compiled as cxgb.o for 2.4 kernels, or cxgb.ko for kernels later than 2.5. 3. Be sure you are running the Linux kernel you have built the module against. If you try to load the module while running a different kernel, you may experience problems. Before loading the driver, make sure your kernel has loadable module support enabled. Your kernel config should have CONFIG_MODULES=y at a minimum. See the man pages for modprobe, depmod, insmod, rmmod, and lsmod for more info. Load the driver: insmod cxgb Unload the driver: rmmod cxgb NOTE: It is important that the match exactly the kernel major.minor.micro-extraversion of the kernel that you built the module. To check the version name, use: uname -r If you did not run make install, you will need to manually copy the driver to the proper location: cp cxgb.{o,ko} /lib/modules//kernel/drivers/net Then, run depmod so that your system can find the driver location: depmod -a You may want to setup the driver with an alias in /etc/modules.conf. Edit the /etc/modules.conf file and include the alias and driver name: alias eth0 cxgb Please refer to your Linux distribution documentation for details. If you have problems loading or compiling the driver, please be sure to try the driver on a clean kernel first. Some problems could be caused by kernel patches that interfere with the kernel code. You can download a clean Linux kernel from ftp://ftp.kernel.org. Patching the driver into the Linux kernel: ------------------------------------------ Patch the driver into a supported Linux kernel version. The driver may work against other versions, but has not been tested. In order to install the kernel, you will need to have root privileges. 1. Uncompress the tar file: gzip -d < cxgb-2.1.4.tar.gz | tar xf - 2. Patch the driver into the Linux kernel, be sure to use the appropriate patch for your kernel version: cd patches patch -p1 -d < cxgb-2.1.4-linux-.patch 3. Configure the patched kernel. If you have problems with configuration, or you are unsure of which options to include for your specific machine, you can try one of the config files included in this driver release. If none of configs match your machine, you will need to identify the hardware for your system and select the appropriate options. NOTE: It is generally not good practice to copy the config file from your Linux distribution kernel, unless the kernel version matches the version of your new kernel. If you need a default config file, just launch menuconfig without any command-line arguments and a default config will be created for you, but it is recommended that you properly configure the kernel for your hardware. cd make menuconfig OR make xconfig Linux 2.4.x configuration options: ----------------------------------- The following options are required to enable the N110/N210 10Gb NIC: Network device support -> Ethernet (1000 Mbit) --> <*> Chelsio 10Gb Ethernet support Linux 2.6.x configuration options: ---------------------------------- The following options are required to enable the N110/N210 10Gb NIC: Device Drivers -> Networking Support --> Ethernet (10000 Mbit) --> <*> Chelsio 10Gb Ethernet support General Linux configuration notes: ---------------------------------- Some configuration options may enhance the performance of your system while other options may decrease your system performance. It is known to Chelsio Communications that the following kernel options will cause problems or undesired performance during the operation of high-speed Ethernet controllers: Processor type and features -> [ ] Enable kernel irq balancing (2.6.x kernels) This kernel feature should be disabled. Although the feature may improve your system performance by moving the system interrupts (IRQ) to less loaded CPUs (on an SMP system), this will interfere with the performance of any network controller if the assigned interrupts are relocated during data transfer, causing a large number of TCP retransmits. NOTE: Some Linux distributions include an irqbalance daemon which starts during sysinit. This daemon performs the same action as the kernel built-in and it should be disabled as well. Kernel hacking -> [ ] Kernel debugging This kernel feature should always be disabled. Enabling this feature will slow down the kernel and decrease the 10Gb NIC performance. Device drivers -> Networking support -> < > Network console logging support (EXPERIMENTAL) This kernel feature should be disabled. Enabling this feature may cause delays in traffic if the volume of kernel messages is high. 4. Compile the Linux kernel: make dep (depreciated for kernels 2.5 and later) make clean make bzImage make modules make modules_install NOTE: These are general commands for compiling the Linux kernel. Your distribution may have a different way of doing things, or you may prefer a different method. These commands are shown only to provide an example of what to do and are by no means definitive. 5. Prepare the system for the new kernel: The following instructions are very generic and serve primarily as a "reminder" of what to do next. Your Linux distribution may have a different way of doing things, or you may prefer a different method. These commands are shown only to provide an example of what to do and are by no means definitive. Copy the compressed kernel image onto boot partition, typically /boot: cp arch//boot/bzImage /boot/vmlinuz-2.x.x-extraversion You may need the System.map file, copy it onto the boot partition also: cp System.map /boot/System.map-2.x.x-extraversion Edit your boot loader config file. Reboot the system into the new Linux kernel. Network configuration --------------------- Please refer to your Linux distribution documentation for details on configuring your system to work with network devices. NOTES: Please be aware that when switching from a 2.4.x kernel to a 2.6.x kernel your Ethernet interface names may be re-ordered. This is not caused by Chelsio software or the Chelsio N110/N210 hardware. This is a known issue within Linux when your driver has been compiled into the Linux kernel. You can find more information about this in the various Linux-Kernel newsgroups topic "NICs trading places". Networking-tools provides the "nameif" utility to change your interface name. Also, Linux 2.6 kernels provide a mechanism to change your interface name using the "ip" command, see the following example: ifdown eth0 ifdown eth1 ip link set eth0 name eth1temp ip link set eth1 name eth0 ip link set eth1temp name eth1 ifup eth0 ifup eth1 You may need to identify which interface the device driver has attached to in order to create an ifcfg- configuration file: dmesg | grep Chelsio After configuring the network, bring up the Chelsio 10Gb NIC using the "ifup" command. You are now ready to use the NIC. FEATURES ======== Adaptive Interrupts (adaptive-rx) --------------------------------- This feature provides an adaptive algorithm that adjusts the interrupt coalescing parameters, allowing the driver to dynamically adapt the latency settings to achieve the highest performance during various types of network load. The interface used to control this feature is ethtool. Please see the ethtool manpage for additional usage information. By default, adaptive-rx is disabled. To enable adaptive-rx: ethtool -C adaptive-rx on To disable adaptive-rx, use ethtool: ethtool -C adaptive-rx off After disabling adaptive-rx, the timer latency value will be set to 50us. You may set the timer latency after disabling adaptive-rx: ethtool -C rx-usecs An example to set the timer latency value to 100us on eth0: ethtool -C eth0 rx-usecs 100 You may also provide a timer latency value while disabling adpative-rx: ethtool -C adaptive-rx off rx-usecs If adaptive-rx is disabled and a timer latency value is specified, the timer will be set to the specified value until changed by the user or until adaptive-rx is enabled. To view the status of the adaptive-rx and timer latency values: ethtool -c TCP Segmentation Offloading (TSO) Support ----------------------------------------- This feature, also known as "large send", enables a system's protocol stack to offload portions of outbound TCP processing to a network interface card thereby reducing system CPU utilization and enhancing performance. This feature is available with Linux 2.6.x and RHEL 2.4.21-x kernels. TSO may also be supported by other kernels, please check your kernel documentation for details. The interface used to control this feature is ethtool version 1.8 or higher. Please see the ethtool manpage for additional usage information. By default, TSO is enabled. To disable TSO: ethtool -K tso off To enable TSO: ethtool -K tso on To view the status of TSO: ethtool -k DRIVER MESSAGES =============== The following messages are the most common messages logged by syslog. These may be found in /var/log/messages. Driver up: Chelsio Network Driver - version 2.1.4 NIC detected: eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit Link up: eth#: link is up at 10 Gbps, full duplex Link down: eth#: link is down LIMITATIONS =========== The current version of this driver has been tested on Red Hat and Red Hat Enterprise distributions for x86 (i386) and x86_64 (AMD64 & EM64T) architectures. The driver should work on other CPU architectures and kernels, but limited testing has been performed. The Makefile may need to be modified to include architecture specific compiler options and/or some changes in the source code may be required to work on other architectures and kernels. KNOWN ISSUES ============ These issues have been identified during testing. The following information is provided as a workaround to the problem. In some cases, this problem is inherent to Linux or to a particular Linux Distribution and/or hardware platform. 1. Large number of TCP retransmits on a multiprocessor (SMP) system. On a system with multiple CPUs, the interrupt (IRQ) for the network controller may be bound to more than one CPU. This will cause TCP retransmits if the packet data were to be split across different CPUs and re-assembled in a different order than expected. To eliminate the TCP retransmits, set smp_affinity on the particular interrupt to a single CPU. You can locate the interrupt (IRQ) used on the N110/N210 by using ifconfig: ifconfig | grep Interrupt Set the smp_affinity to a single CPU: echo 1 > /proc/irq//smp_affinity It is highly suggested that you do not run the irqbalance daemon on your system, as this will change any smp_affinity setting you have applied. The irqbalance daemon runs on a 10 second interval and binds interrupts to the least loaded CPU determined by the daemon. To disable this daemon: chkconfig --level 2345 irqbalance off By default, some Linux distributions enable the kernel feature, irqbalance, which performs the same function as the daemon. To disable this feature, add the following line to your bootloader: noirqbalance Example using the Grub bootloader: title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp) root (hd0,0) kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance initrd /initrd-2.4.21-27.ELsmp.img 2. After running insmod, the driver is loaded and the incorrect network interface is brought up without running ifup. When using 2.4.x kernels, including RHEL kernels, the Linux kernel invokes a script named "hotplug". This script is primarily used to automatically bring up USB devices when they are plugged in, however, the script also attempts to automatically bring up a network interface after loading the kernel module. The hotplug script does this by scanning the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking for HWADDR=. If the hotplug script does not find the HWADDRR within any of the ifcfg-eth# files, it will bring up the device with the next available interface name. If this interface is already configured for a different network card, your new interface will have incorrect IP address and network settings. To solve this issue, you can add the HWADDR= key to the interface config file of your network controller. To disable this "hotplug" feature, you may add the driver (module name) to the "blacklist" file located in /etc/hotplug. It has been noted that this does not work for network devices because the net.agent script does not use the blacklist file. Simply remove, or rename, the net.agent script located in /etc/hotplug to disable this feature. 3. Transport Protocol (TP) hangs when running heavy multi-connection traffic on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset. If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel chipset, you may experience the "133-Mhz Mode Split Completion Data Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the bus PCI-X bus. AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel can provide stale data via split completion cycles to a PCI-X card that is operating at 133 Mhz", causing data corruption. AMD's provides three workarounds for this problem, however, Chelsio recommends the first option for best performance with this bug: For 133Mhz secondary bus operation, limit the transaction length and the number of outstanding transactions, via BIOS configuration programming of the PCI-X card, to the following: Data Length (bytes): 1k Total allowed outstanding transactions: 2 use: setpci -d 1425:* 0x60.l=0x00160007 Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004, section 56, "133-MHz Mode Split Completion Data Corruption" for more details with this bug and workarounds suggested by AMD. PERFORMANCE =========== The following information is provided as an example of how to change system parameters for "performance tuning" an what value to use. You may or may not want to change these system parameters, depending on your server/workstation application. Doing so is not warranted in any way by Chelsio Communications, and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss of data or damage to equipment. Your distribution may have a different way of doing things, or you may prefer a different method. These commands are shown only to provide an example of what to do and are by no means definitive. Making any of the following system changes will only last until you reboot your system. You may want to write a script that runs at boot-up which includes the optimal settings for your system. Setting PCI Latency Timer: setpci -d 1425:* 0x0c.l=0x0000F800 Disabling TCP timestamp: sysctl -w net.ipv4.tcp_timestamps=0 Disabling SACK: sysctl -w net.ipv4.tcp_sack=0 Setting TCP read buffers (min/default/max): sysctl -w net.ipv4.tcp_rmem="4096 131072 393216" Setting TCP write buffers (min/pressure/max): sysctl -w net.ipv4.tcp_wmem="4096 262144 393216" Setting TCP buffer space (min/pressure/max): sysctl -w net.ipv4.tcp_mem="" #default values are calculated at boot. Setting large number of incoming connection requests (2.6.x only): sysctl -w net.ipv4.tcp_max_syn_backlog=3000 Setting maximum receive socket buffer size: sysctl -w net.core.rmem_max=524287 Setting maximum send socket buffer size: sysctl -w net.core.wmem_max=524287 Setting default receive socket buffer size: sysctl -w net.core.rmem_default=524287 Setting default send socket buffer size: sysctl -w net.core.wmem_default=524287 Setting maximum option memory buffers: sysctl -w net.core.optmem_max=524287 Setting maximum backlog (# of unprocessed packets before kernel drops): sysctl -w net.core.netdev_max_backlog=300000 Set smp_affinity (on a multiprocessor system) to a single CPU: echo 1 > /proc/irq//smp_affinity TCP window size for single connections: The receive buffer (RX_WINDOW) size must be at least as large as the Bandwidth-Delay Product of the communication link between the sender and receiver. Due to the variations of RTT, you may want to increase the buffer size up to 2 times the Bandwidth-Delay Product. Reference page 289 of "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens. At 10Gb speeds, use the following formula: RX_WINDOW >= 1.25MBytes * RTT(in milliseconds) Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000 RX_WINDOW sizes of 256KB - 512KB should be sufficient. Setting the min, max, and default receive buffer (RX_WINDOW) size: sysctl -w net.ipv4.tcp_rmem=" " TCP window size for multiple connections: The receive buffer (RX_WINDOW) size may be calculated the same as single connections, but should be divided by the number of connections. The smaller window prevents congestion and facilitates better pacing, especially if/when MAC level flow control does not work well or when it is not supported on the machine. Experimentation may be necessary to attain the correct value. This method is provided as a starting point fot the correct receive buffer size. Setting the min, max, and default receive buffer (RX_WINDOW) size is performed in the same manner as single connection. SUPPORT ======= If you have problems with the software or hardware, please contact our customer support team via email at support@chelsio.com or check our website at http://www.chelsio.com =============================================================================== Chelsio Communications 370 San Aleso Ave. Suite 100 Sunnyvale, CA 94085 http://www.chelsio.com Copyright (c) 2004,2005 Chelsio Communications. All rights reserved. ===============================================================================