| Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 1 | USERSPACE VERBS ACCESS | 
 | 2 |  | 
 | 3 |   The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, | 
 | 4 |   enables direct userspace access to IB hardware via "verbs," as | 
 | 5 |   described in chapter 11 of the InfiniBand Architecture Specification. | 
 | 6 |  | 
 | 7 |   To use the verbs, the libibverbs library, available from | 
| Justin P. Mattock | 0ea6e61 | 2010-07-23 20:51:24 -0700 | [diff] [blame] | 8 |   http://www.openfabrics.org/, is required.  libibverbs contains a | 
| Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 9 |   device-independent API for using the ib_uverbs interface. | 
 | 10 |   libibverbs also requires appropriate device-dependent kernel and | 
 | 11 |   userspace driver for your InfiniBand hardware.  For example, to use | 
 | 12 |   a Mellanox HCA, you will need the ib_mthca kernel module and the | 
 | 13 |   libmthca userspace driver be installed. | 
 | 14 |  | 
 | 15 | User-kernel communication | 
 | 16 |  | 
 | 17 |   Userspace communicates with the kernel for slow path, resource | 
 | 18 |   management operations via the /dev/infiniband/uverbsN character | 
 | 19 |   devices.  Fast path operations are typically performed by writing | 
 | 20 |   directly to hardware registers mmap()ed into userspace, with no | 
 | 21 |   system call or context switch into the kernel. | 
 | 22 |  | 
 | 23 |   Commands are sent to the kernel via write()s on these device files. | 
 | 24 |   The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. | 
 | 25 |   The structs for commands that require a response from the kernel | 
 | 26 |   contain a 64-bit field used to pass a pointer to an output buffer. | 
 | 27 |   Status is returned to userspace as the return value of the write() | 
 | 28 |   system call. | 
 | 29 |  | 
 | 30 | Resource management | 
 | 31 |  | 
 | 32 |   Since creation and destruction of all IB resources is done by | 
 | 33 |   commands passed through a file descriptor, the kernel can keep track | 
 | 34 |   of which resources are attached to a given userspace context.  The | 
 | 35 |   ib_uverbs module maintains idr tables that are used to translate | 
 | 36 |   between kernel pointers and opaque userspace handles, so that kernel | 
 | 37 |   pointers are never exposed to userspace and userspace cannot trick | 
 | 38 |   the kernel into following a bogus pointer. | 
 | 39 |  | 
 | 40 |   This also allows the kernel to clean up when a process exits and | 
 | 41 |   prevent one process from touching another process's resources. | 
 | 42 |  | 
 | 43 | Memory pinning | 
 | 44 |  | 
 | 45 |   Direct userspace I/O requires that memory regions that are potential | 
 | 46 |   I/O targets be kept resident at the same physical address.  The | 
 | 47 |   ib_uverbs module manages pinning and unpinning memory regions via | 
 | 48 |   get_user_pages() and put_page() calls.  It also accounts for the | 
 | 49 |   amount of memory pinned in the process's locked_vm, and checks that | 
 | 50 |   unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. | 
 | 51 |  | 
 | 52 |   Pages that are pinned multiple times are counted each time they are | 
 | 53 |   pinned, so the value of locked_vm may be an overestimate of the | 
 | 54 |   number of pages pinned by a process. | 
 | 55 |  | 
 | 56 | /dev files | 
 | 57 |  | 
 | 58 |   To create the appropriate character device files automatically with | 
 | 59 |   udev, a rule like | 
 | 60 |  | 
| Bart Van Assche | aa07a99 | 2009-10-07 15:35:55 -0700 | [diff] [blame] | 61 |     KERNEL=="uverbs*", NAME="infiniband/%k" | 
| Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 62 |  | 
 | 63 |   can be used.  This will create device nodes named | 
 | 64 |  | 
 | 65 |     /dev/infiniband/uverbs0 | 
 | 66 |  | 
 | 67 |   and so on.  Since the InfiniBand userspace verbs should be safe for | 
 | 68 |   use by non-privileged processes, it may be useful to add an | 
 | 69 |   appropriate MODE or GROUP to the udev rule. |