| Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 1 | USERSPACE VERBS ACCESS | 
|  | 2 |  | 
|  | 3 | The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, | 
|  | 4 | enables direct userspace access to IB hardware via "verbs," as | 
|  | 5 | described in chapter 11 of the InfiniBand Architecture Specification. | 
|  | 6 |  | 
|  | 7 | To use the verbs, the libibverbs library, available from | 
|  | 8 | <http://openib.org/>, is required.  libibverbs contains a | 
|  | 9 | device-independent API for using the ib_uverbs interface. | 
|  | 10 | libibverbs also requires appropriate device-dependent kernel and | 
|  | 11 | userspace driver for your InfiniBand hardware.  For example, to use | 
|  | 12 | a Mellanox HCA, you will need the ib_mthca kernel module and the | 
|  | 13 | libmthca userspace driver be installed. | 
|  | 14 |  | 
|  | 15 | User-kernel communication | 
|  | 16 |  | 
|  | 17 | Userspace communicates with the kernel for slow path, resource | 
|  | 18 | management operations via the /dev/infiniband/uverbsN character | 
|  | 19 | devices.  Fast path operations are typically performed by writing | 
|  | 20 | directly to hardware registers mmap()ed into userspace, with no | 
|  | 21 | system call or context switch into the kernel. | 
|  | 22 |  | 
|  | 23 | Commands are sent to the kernel via write()s on these device files. | 
|  | 24 | The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. | 
|  | 25 | The structs for commands that require a response from the kernel | 
|  | 26 | contain a 64-bit field used to pass a pointer to an output buffer. | 
|  | 27 | Status is returned to userspace as the return value of the write() | 
|  | 28 | system call. | 
|  | 29 |  | 
|  | 30 | Resource management | 
|  | 31 |  | 
|  | 32 | Since creation and destruction of all IB resources is done by | 
|  | 33 | commands passed through a file descriptor, the kernel can keep track | 
|  | 34 | of which resources are attached to a given userspace context.  The | 
|  | 35 | ib_uverbs module maintains idr tables that are used to translate | 
|  | 36 | between kernel pointers and opaque userspace handles, so that kernel | 
|  | 37 | pointers are never exposed to userspace and userspace cannot trick | 
|  | 38 | the kernel into following a bogus pointer. | 
|  | 39 |  | 
|  | 40 | This also allows the kernel to clean up when a process exits and | 
|  | 41 | prevent one process from touching another process's resources. | 
|  | 42 |  | 
|  | 43 | Memory pinning | 
|  | 44 |  | 
|  | 45 | Direct userspace I/O requires that memory regions that are potential | 
|  | 46 | I/O targets be kept resident at the same physical address.  The | 
|  | 47 | ib_uverbs module manages pinning and unpinning memory regions via | 
|  | 48 | get_user_pages() and put_page() calls.  It also accounts for the | 
|  | 49 | amount of memory pinned in the process's locked_vm, and checks that | 
|  | 50 | unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. | 
|  | 51 |  | 
|  | 52 | Pages that are pinned multiple times are counted each time they are | 
|  | 53 | pinned, so the value of locked_vm may be an overestimate of the | 
|  | 54 | number of pages pinned by a process. | 
|  | 55 |  | 
|  | 56 | /dev files | 
|  | 57 |  | 
|  | 58 | To create the appropriate character device files automatically with | 
|  | 59 | udev, a rule like | 
|  | 60 |  | 
|  | 61 | KERNEL="uverbs*", NAME="infiniband/%k" | 
|  | 62 |  | 
|  | 63 | can be used.  This will create device nodes named | 
|  | 64 |  | 
|  | 65 | /dev/infiniband/uverbs0 | 
|  | 66 |  | 
|  | 67 | and so on.  Since the InfiniBand userspace verbs should be safe for | 
|  | 68 | use by non-privileged processes, it may be useful to add an | 
|  | 69 | appropriate MODE or GROUP to the udev rule. |