| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 1 | ################################################################################ | 
|  | 2 | #									       # | 
|  | 3 | #				NFS/RDMA README				       # | 
|  | 4 | #									       # | 
|  | 5 | ################################################################################ | 
|  | 6 |  | 
|  | 7 | Author: NetApp and Open Grid Computing | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 8 | Date: May 29, 2008 | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 9 |  | 
|  | 10 | Table of Contents | 
|  | 11 | ~~~~~~~~~~~~~~~~~ | 
|  | 12 | - Overview | 
|  | 13 | - Getting Help | 
|  | 14 | - Installation | 
|  | 15 | - Check RDMA and NFS Setup | 
|  | 16 | - NFS/RDMA Setup | 
|  | 17 |  | 
|  | 18 | Overview | 
|  | 19 | ~~~~~~~~ | 
|  | 20 |  | 
|  | 21 | This document describes how to install and setup the Linux NFS/RDMA client | 
|  | 22 | and server software. | 
|  | 23 |  | 
|  | 24 | The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server | 
|  | 25 | was first included in the following release, Linux 2.6.25. | 
|  | 26 |  | 
|  | 27 | In our testing, we have obtained excellent performance results (full 10Gbit | 
|  | 28 | wire bandwidth at minimal client CPU) under many workloads. The code passes | 
|  | 29 | the full Connectathon test suite and operates over both Infiniband and iWARP | 
|  | 30 | RDMA adapters. | 
|  | 31 |  | 
|  | 32 | Getting Help | 
|  | 33 | ~~~~~~~~~~~~ | 
|  | 34 |  | 
|  | 35 | If you get stuck, you can ask questions on the | 
|  | 36 |  | 
|  | 37 | nfs-rdma-devel@lists.sourceforge.net | 
|  | 38 |  | 
|  | 39 | mailing list. | 
|  | 40 |  | 
|  | 41 | Installation | 
|  | 42 | ~~~~~~~~~~~~ | 
|  | 43 |  | 
|  | 44 | These instructions are a step by step guide to building a machine for | 
|  | 45 | use with NFS/RDMA. | 
|  | 46 |  | 
|  | 47 | - Install an RDMA device | 
|  | 48 |  | 
|  | 49 | Any device supported by the drivers in drivers/infiniband/hw is acceptable. | 
|  | 50 |  | 
|  | 51 | Testing has been performed using several Mellanox-based IB cards, the | 
|  | 52 | Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. | 
|  | 53 |  | 
|  | 54 | - Install a Linux distribution and tools | 
|  | 55 |  | 
|  | 56 | The first kernel release to contain both the NFS/RDMA client and server was | 
|  | 57 | Linux 2.6.25  Therefore, a distribution compatible with this and subsequent | 
|  | 58 | Linux kernel release should be installed. | 
|  | 59 |  | 
|  | 60 | The procedures described in this document have been tested with | 
|  | 61 | distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). | 
|  | 62 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 63 | - Install nfs-utils-1.1.2 or greater on the client | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 64 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 65 | An NFS/RDMA mount point can be obtained by using the mount.nfs command in | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 66 | nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils | 
|  | 67 | version with support for NFS/RDMA mounts, but for various reasons we | 
|  | 68 | recommend using nfs-utils-1.1.2 or greater). To see which version of | 
|  | 69 | mount.nfs you are using, type: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 70 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 71 | $ /sbin/mount.nfs -V | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 72 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 73 | If the version is less than 1.1.2 or the command does not exist, | 
|  | 74 | you should install the latest version of nfs-utils. | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 75 |  | 
|  | 76 | Download the latest package from: | 
|  | 77 |  | 
|  | 78 | http://www.kernel.org/pub/linux/utils/nfs | 
|  | 79 |  | 
|  | 80 | Uncompress the package and follow the installation instructions. | 
|  | 81 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 82 | If you will not need the idmapper and gssd executables (you do not need | 
|  | 83 | these to create an NFS/RDMA enabled mount command), the installation | 
|  | 84 | process can be simplified by disabling these features when running | 
|  | 85 | configure: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 86 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 87 | $ ./configure --disable-gss --disable-nfsv4 | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 88 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 89 | To build nfs-utils you will need the tcp_wrappers package installed. For | 
|  | 90 | more information on this see the package's README and INSTALL files. | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 91 |  | 
|  | 92 | After building the nfs-utils package, there will be a mount.nfs binary in | 
|  | 93 | the utils/mount directory. This binary can be used to initiate NFS v2, v3, | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 94 | or v4 mounts. To initiate a v4 mount, the binary must be called | 
|  | 95 | mount.nfs4.  The standard technique is to create a symlink called | 
|  | 96 | mount.nfs4 to mount.nfs. | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 97 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 98 | This mount.nfs binary should be installed at /sbin/mount.nfs as follows: | 
|  | 99 |  | 
|  | 100 | $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs | 
|  | 101 |  | 
|  | 102 | In this location, mount.nfs will be invoked automatically for NFS mounts | 
| Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 103 | by the system mount command. | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 104 |  | 
|  | 105 | NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 106 | on the NFS client machine. You do not need this specific version of | 
|  | 107 | nfs-utils on the server. Furthermore, only the mount.nfs command from | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 108 | nfs-utils-1.1.2 is needed on the client. | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 109 |  | 
|  | 110 | - Install a Linux kernel with NFS/RDMA | 
|  | 111 |  | 
|  | 112 | The NFS/RDMA client and server are both included in the mainline Linux | 
|  | 113 | kernel version 2.6.25 and later. This and other versions of the 2.6 Linux | 
|  | 114 | kernel can be found at: | 
|  | 115 |  | 
|  | 116 | ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ | 
|  | 117 |  | 
|  | 118 | Download the sources and place them in an appropriate location. | 
|  | 119 |  | 
|  | 120 | - Configure the RDMA stack | 
|  | 121 |  | 
|  | 122 | Make sure your kernel configuration has RDMA support enabled. Under | 
|  | 123 | Device Drivers -> InfiniBand support, update the kernel configuration | 
|  | 124 | to enable InfiniBand support [NOTE: the option name is misleading. Enabling | 
|  | 125 | InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. | 
|  | 126 |  | 
|  | 127 | Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or | 
|  | 128 | iWARP adapter support (amso, cxgb3, etc.). | 
|  | 129 |  | 
|  | 130 | If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. | 
|  | 131 |  | 
|  | 132 | - Configure the NFS client and server | 
|  | 133 |  | 
|  | 134 | Your kernel configuration must also have NFS file system support and/or | 
|  | 135 | NFS server support enabled. These and other NFS related configuration | 
|  | 136 | options can be found under File Systems -> Network File Systems. | 
|  | 137 |  | 
|  | 138 | - Build, install, reboot | 
|  | 139 |  | 
|  | 140 | The NFS/RDMA code will be enabled automatically if NFS and RDMA | 
|  | 141 | are turned on. The NFS/RDMA client and server are configured via the hidden | 
|  | 142 | SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The | 
|  | 143 | value of SUNRPC_XPRT_RDMA will be: | 
|  | 144 |  | 
|  | 145 | - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client | 
|  | 146 | and server will not be built | 
|  | 147 | - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, | 
|  | 148 | in this case the NFS/RDMA client and server will be built as modules | 
|  | 149 | - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client | 
|  | 150 | and server will be built into the kernel | 
|  | 151 |  | 
|  | 152 | Therefore, if you have followed the steps above and turned no NFS and RDMA, | 
|  | 153 | the NFS/RDMA client and server will be built. | 
|  | 154 |  | 
|  | 155 | Build a new kernel, install it, boot it. | 
|  | 156 |  | 
|  | 157 | Check RDMA and NFS Setup | 
|  | 158 | ~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  | 159 |  | 
|  | 160 | Before configuring the NFS/RDMA software, it is a good idea to test | 
|  | 161 | your new kernel to ensure that the kernel is working correctly. | 
|  | 162 | In particular, it is a good idea to verify that the RDMA stack | 
|  | 163 | is functioning as expected and standard NFS over TCP/IP and/or UDP/IP | 
|  | 164 | is working properly. | 
|  | 165 |  | 
|  | 166 | - Check RDMA Setup | 
|  | 167 |  | 
|  | 168 | If you built the RDMA components as modules, load them at | 
|  | 169 | this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel | 
|  | 170 | card: | 
|  | 171 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 172 | $ modprobe ib_mthca | 
|  | 173 | $ modprobe ib_ipoib | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 174 |  | 
|  | 175 | If you are using InfiniBand, make sure there is a Subnet Manager (SM) | 
|  | 176 | running on the network. If your IB switch has an embedded SM, you can | 
|  | 177 | use it. Otherwise, you will need to run an SM, such as OpenSM, on one | 
|  | 178 | of your end nodes. | 
|  | 179 |  | 
|  | 180 | If an SM is running on your network, you should see the following: | 
|  | 181 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 182 | $ cat /sys/class/infiniband/driverX/ports/1/state | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 183 | 4: ACTIVE | 
|  | 184 |  | 
|  | 185 | where driverX is mthca0, ipath5, ehca3, etc. | 
|  | 186 |  | 
|  | 187 | To further test the InfiniBand software stack, use IPoIB (this | 
|  | 188 | assumes you have two IB hosts named host1 and host2): | 
|  | 189 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 190 | host1$ ifconfig ib0 a.b.c.x | 
|  | 191 | host2$ ifconfig ib0 a.b.c.y | 
|  | 192 | host1$ ping a.b.c.y | 
|  | 193 | host2$ ping a.b.c.x | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 194 |  | 
|  | 195 | For other device types, follow the appropriate procedures. | 
|  | 196 |  | 
|  | 197 | - Check NFS Setup | 
|  | 198 |  | 
|  | 199 | For the NFS components enabled above (client and/or server), | 
|  | 200 | test their functionality over standard Ethernet using TCP/IP or UDP/IP. | 
|  | 201 |  | 
|  | 202 | NFS/RDMA Setup | 
|  | 203 | ~~~~~~~~~~~~~~ | 
|  | 204 |  | 
|  | 205 | We recommend that you use two machines, one to act as the client and | 
|  | 206 | one to act as the server. | 
|  | 207 |  | 
|  | 208 | One time configuration: | 
|  | 209 |  | 
|  | 210 | - On the server system, configure the /etc/exports file and | 
|  | 211 | start the NFS/RDMA server. | 
|  | 212 |  | 
| James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 213 | Exports entries with the following formats have been tested: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 214 |  | 
| James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 215 | /vol0   192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) | 
|  | 216 | /vol0   192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 217 |  | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 218 | The IP address(es) is(are) the client's IPoIB address for an InfiniBand | 
|  | 219 | HCA or the cleint's iWARP address(es) for an RNIC. | 
| James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 220 |  | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 221 | NOTE: The "insecure" option must be used because the NFS/RDMA client does | 
|  | 222 | not use a reserved port. | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 223 |  | 
|  | 224 | Each time a machine boots: | 
|  | 225 |  | 
|  | 226 | - Load and configure the RDMA drivers | 
|  | 227 |  | 
|  | 228 | For InfiniBand using a Mellanox adapter: | 
|  | 229 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 230 | $ modprobe ib_mthca | 
|  | 231 | $ modprobe ib_ipoib | 
|  | 232 | $ ifconfig ib0 a.b.c.d | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 233 |  | 
|  | 234 | NOTE: use unique addresses for the client and server | 
|  | 235 |  | 
|  | 236 | - Start the NFS server | 
|  | 237 |  | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 238 | If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in | 
|  | 239 | kernel config), load the RDMA transport module: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 240 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 241 | $ modprobe svcrdma | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 242 |  | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 243 | Regardless of how the server was built (module or built-in), start the | 
|  | 244 | server: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 245 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 246 | $ /etc/init.d/nfs start | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 247 |  | 
|  | 248 | or | 
|  | 249 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 250 | $ service nfs start | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 251 |  | 
|  | 252 | Instruct the server to listen on the RDMA transport: | 
|  | 253 |  | 
| James Lentini | 096abd7 | 2009-01-08 13:13:26 -0500 | [diff] [blame] | 254 | $ echo rdma 20049 > /proc/fs/nfsd/portlist | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 255 |  | 
|  | 256 | - On the client system | 
|  | 257 |  | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 258 | If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in | 
|  | 259 | kernel config), load the RDMA client module: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 260 |  | 
| James Lentini | 007de8b | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 261 | $ modprobe xprtrdma.ko | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 262 |  | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 263 | Regardless of how the client was built (module or built-in), use this | 
|  | 264 | command to mount the NFS/RDMA server: | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 265 |  | 
| James Lentini | 096abd7 | 2009-01-08 13:13:26 -0500 | [diff] [blame] | 266 | $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 267 |  | 
| J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 268 | To verify that the mount is using RDMA, run "cat /proc/mounts" and check | 
|  | 269 | the "proto" field for the given mount. | 
| James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 270 |  | 
|  | 271 | Congratulations! You're using NFS/RDMA! |