| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 1 | ################################################################################ | 
 | 2 | #									       # | 
 | 3 | #				NFS/RDMA README				       # | 
 | 4 | #									       # | 
 | 5 | ################################################################################ | 
 | 6 |  | 
 | 7 |  Author: NetApp and Open Grid Computing | 
| James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 8 |  Date: April 15, 2008 | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 9 |  | 
 | 10 | Table of Contents | 
 | 11 | ~~~~~~~~~~~~~~~~~ | 
 | 12 |  - Overview | 
 | 13 |  - Getting Help | 
 | 14 |  - Installation | 
 | 15 |  - Check RDMA and NFS Setup | 
 | 16 |  - NFS/RDMA Setup | 
 | 17 |  | 
 | 18 | Overview | 
 | 19 | ~~~~~~~~ | 
 | 20 |  | 
 | 21 |   This document describes how to install and setup the Linux NFS/RDMA client | 
 | 22 |   and server software. | 
 | 23 |  | 
 | 24 |   The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server | 
 | 25 |   was first included in the following release, Linux 2.6.25. | 
 | 26 |  | 
 | 27 |   In our testing, we have obtained excellent performance results (full 10Gbit | 
 | 28 |   wire bandwidth at minimal client CPU) under many workloads. The code passes | 
 | 29 |   the full Connectathon test suite and operates over both Infiniband and iWARP | 
 | 30 |   RDMA adapters. | 
 | 31 |  | 
 | 32 | Getting Help | 
 | 33 | ~~~~~~~~~~~~ | 
 | 34 |  | 
 | 35 |   If you get stuck, you can ask questions on the | 
 | 36 |  | 
 | 37 |                 nfs-rdma-devel@lists.sourceforge.net | 
 | 38 |  | 
 | 39 |   mailing list. | 
 | 40 |  | 
 | 41 | Installation | 
 | 42 | ~~~~~~~~~~~~ | 
 | 43 |  | 
 | 44 |   These instructions are a step by step guide to building a machine for | 
 | 45 |   use with NFS/RDMA. | 
 | 46 |  | 
 | 47 |   - Install an RDMA device | 
 | 48 |  | 
 | 49 |     Any device supported by the drivers in drivers/infiniband/hw is acceptable. | 
 | 50 |  | 
 | 51 |     Testing has been performed using several Mellanox-based IB cards, the | 
 | 52 |     Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. | 
 | 53 |  | 
 | 54 |   - Install a Linux distribution and tools | 
 | 55 |  | 
 | 56 |     The first kernel release to contain both the NFS/RDMA client and server was | 
 | 57 |     Linux 2.6.25  Therefore, a distribution compatible with this and subsequent | 
 | 58 |     Linux kernel release should be installed. | 
 | 59 |  | 
 | 60 |     The procedures described in this document have been tested with | 
 | 61 |     distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). | 
 | 62 |  | 
 | 63 |   - Install nfs-utils-1.1.1 or greater on the client | 
 | 64 |  | 
 | 65 |     An NFS/RDMA mount point can only be obtained by using the mount.nfs | 
 | 66 |     command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs | 
 | 67 |     you are using, type: | 
 | 68 |  | 
 | 69 |     > /sbin/mount.nfs -V | 
 | 70 |  | 
 | 71 |     If the version is less than 1.1.1 or the command does not exist, | 
 | 72 |     then you will need to install the latest version of nfs-utils. | 
 | 73 |  | 
 | 74 |     Download the latest package from: | 
 | 75 |  | 
 | 76 |     http://www.kernel.org/pub/linux/utils/nfs | 
 | 77 |  | 
 | 78 |     Uncompress the package and follow the installation instructions. | 
 | 79 |  | 
 | 80 |     If you will not be using GSS and NFSv4, the installation process | 
 | 81 |     can be simplified by disabling these features when running configure: | 
 | 82 |  | 
 | 83 |     > ./configure --disable-gss --disable-nfsv4 | 
 | 84 |  | 
 | 85 |     For more information on this see the package's README and INSTALL files. | 
 | 86 |  | 
 | 87 |     After building the nfs-utils package, there will be a mount.nfs binary in | 
 | 88 |     the utils/mount directory. This binary can be used to initiate NFS v2, v3, | 
 | 89 |     or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4. | 
 | 90 |     The standard technique is to create a symlink called mount.nfs4 to mount.nfs. | 
 | 91 |  | 
 | 92 |     NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed | 
 | 93 |     on the NFS client machine. You do not need this specific version of | 
 | 94 |     nfs-utils on the server. Furthermore, only the mount.nfs command from | 
 | 95 |     nfs-utils-1.1.1 is needed on the client. | 
 | 96 |  | 
 | 97 |   - Install a Linux kernel with NFS/RDMA | 
 | 98 |  | 
 | 99 |     The NFS/RDMA client and server are both included in the mainline Linux | 
 | 100 |     kernel version 2.6.25 and later. This and other versions of the 2.6 Linux | 
 | 101 |     kernel can be found at: | 
 | 102 |  | 
 | 103 |     ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ | 
 | 104 |  | 
 | 105 |     Download the sources and place them in an appropriate location. | 
 | 106 |  | 
 | 107 |   - Configure the RDMA stack | 
 | 108 |  | 
 | 109 |     Make sure your kernel configuration has RDMA support enabled. Under | 
 | 110 |     Device Drivers -> InfiniBand support, update the kernel configuration | 
 | 111 |     to enable InfiniBand support [NOTE: the option name is misleading. Enabling | 
 | 112 |     InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. | 
 | 113 |  | 
 | 114 |     Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or | 
 | 115 |     iWARP adapter support (amso, cxgb3, etc.). | 
 | 116 |  | 
 | 117 |     If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. | 
 | 118 |  | 
 | 119 |   - Configure the NFS client and server | 
 | 120 |  | 
 | 121 |     Your kernel configuration must also have NFS file system support and/or | 
 | 122 |     NFS server support enabled. These and other NFS related configuration | 
 | 123 |     options can be found under File Systems -> Network File Systems. | 
 | 124 |  | 
 | 125 |   - Build, install, reboot | 
 | 126 |  | 
 | 127 |     The NFS/RDMA code will be enabled automatically if NFS and RDMA | 
 | 128 |     are turned on. The NFS/RDMA client and server are configured via the hidden | 
 | 129 |     SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The | 
 | 130 |     value of SUNRPC_XPRT_RDMA will be: | 
 | 131 |  | 
 | 132 |      - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client | 
 | 133 |        and server will not be built | 
 | 134 |      - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, | 
 | 135 |        in this case the NFS/RDMA client and server will be built as modules | 
 | 136 |      - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client | 
 | 137 |        and server will be built into the kernel | 
 | 138 |  | 
 | 139 |     Therefore, if you have followed the steps above and turned no NFS and RDMA, | 
 | 140 |     the NFS/RDMA client and server will be built. | 
 | 141 |  | 
 | 142 |     Build a new kernel, install it, boot it. | 
 | 143 |  | 
 | 144 | Check RDMA and NFS Setup | 
 | 145 | ~~~~~~~~~~~~~~~~~~~~~~~~ | 
 | 146 |  | 
 | 147 |     Before configuring the NFS/RDMA software, it is a good idea to test | 
 | 148 |     your new kernel to ensure that the kernel is working correctly. | 
 | 149 |     In particular, it is a good idea to verify that the RDMA stack | 
 | 150 |     is functioning as expected and standard NFS over TCP/IP and/or UDP/IP | 
 | 151 |     is working properly. | 
 | 152 |  | 
 | 153 |   - Check RDMA Setup | 
 | 154 |  | 
 | 155 |     If you built the RDMA components as modules, load them at | 
 | 156 |     this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel | 
 | 157 |     card: | 
 | 158 |  | 
 | 159 |     > modprobe ib_mthca | 
 | 160 |     > modprobe ib_ipoib | 
 | 161 |  | 
 | 162 |     If you are using InfiniBand, make sure there is a Subnet Manager (SM) | 
 | 163 |     running on the network. If your IB switch has an embedded SM, you can | 
 | 164 |     use it. Otherwise, you will need to run an SM, such as OpenSM, on one | 
 | 165 |     of your end nodes. | 
 | 166 |  | 
 | 167 |     If an SM is running on your network, you should see the following: | 
 | 168 |  | 
 | 169 |     > cat /sys/class/infiniband/driverX/ports/1/state | 
 | 170 |     4: ACTIVE | 
 | 171 |  | 
 | 172 |     where driverX is mthca0, ipath5, ehca3, etc. | 
 | 173 |  | 
 | 174 |     To further test the InfiniBand software stack, use IPoIB (this | 
 | 175 |     assumes you have two IB hosts named host1 and host2): | 
 | 176 |  | 
 | 177 |     host1> ifconfig ib0 a.b.c.x | 
 | 178 |     host2> ifconfig ib0 a.b.c.y | 
 | 179 |     host1> ping a.b.c.y | 
 | 180 |     host2> ping a.b.c.x | 
 | 181 |  | 
 | 182 |     For other device types, follow the appropriate procedures. | 
 | 183 |  | 
 | 184 |   - Check NFS Setup | 
 | 185 |  | 
 | 186 |     For the NFS components enabled above (client and/or server), | 
 | 187 |     test their functionality over standard Ethernet using TCP/IP or UDP/IP. | 
 | 188 |  | 
 | 189 | NFS/RDMA Setup | 
 | 190 | ~~~~~~~~~~~~~~ | 
 | 191 |  | 
 | 192 |   We recommend that you use two machines, one to act as the client and | 
 | 193 |   one to act as the server. | 
 | 194 |  | 
 | 195 |   One time configuration: | 
 | 196 |  | 
 | 197 |   - On the server system, configure the /etc/exports file and | 
 | 198 |     start the NFS/RDMA server. | 
 | 199 |  | 
| James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 200 |     Exports entries with the following formats have been tested: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 201 |  | 
| James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 202 |     /vol0   192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) | 
 | 203 |     /vol0   192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 204 |  | 
| James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 205 |     The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the | 
 | 206 |     cleint's iWARP address(es) for an RNIC. | 
 | 207 |  | 
 | 208 |     NOTE: The "insecure" option must be used because the NFS/RDMA client does not | 
 | 209 |     use a reserved port. | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 210 |  | 
 | 211 |  Each time a machine boots: | 
 | 212 |  | 
 | 213 |   - Load and configure the RDMA drivers | 
 | 214 |  | 
 | 215 |     For InfiniBand using a Mellanox adapter: | 
 | 216 |  | 
 | 217 |     > modprobe ib_mthca | 
 | 218 |     > modprobe ib_ipoib | 
 | 219 |     > ifconfig ib0 a.b.c.d | 
 | 220 |  | 
 | 221 |     NOTE: use unique addresses for the client and server | 
 | 222 |  | 
 | 223 |   - Start the NFS server | 
 | 224 |  | 
 | 225 |     If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), | 
 | 226 |     load the RDMA transport module: | 
 | 227 |  | 
 | 228 |     > modprobe svcrdma | 
 | 229 |  | 
 | 230 |     Regardless of how the server was built (module or built-in), start the server: | 
 | 231 |  | 
 | 232 |     > /etc/init.d/nfs start | 
 | 233 |  | 
 | 234 |     or | 
 | 235 |  | 
 | 236 |     > service nfs start | 
 | 237 |  | 
 | 238 |     Instruct the server to listen on the RDMA transport: | 
 | 239 |  | 
 | 240 |     > echo rdma 2050 > /proc/fs/nfsd/portlist | 
 | 241 |  | 
 | 242 |   - On the client system | 
 | 243 |  | 
 | 244 |     If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), | 
 | 245 |     load the RDMA client module: | 
 | 246 |  | 
 | 247 |     > modprobe xprtrdma.ko | 
 | 248 |  | 
 | 249 |     Regardless of how the client was built (module or built-in), issue the mount.nfs command: | 
 | 250 |  | 
 | 251 |     > /path/to/your/mount.nfs <IPoIB-server-name-or-address>:/<export> /mnt -i -o rdma,port=2050 | 
 | 252 |  | 
 | 253 |     To verify that the mount is using RDMA, run "cat /proc/mounts" and check the | 
 | 254 |     "proto" field for the given mount. | 
 | 255 |  | 
 | 256 |   Congratulations! You're using NFS/RDMA! |