| Tejun Heo | 9ac7849 | 2007-01-20 16:00:26 +0900 | [diff] [blame] | 1 | Devres - Managed Device Resource | 
 | 2 | ================================ | 
 | 3 |  | 
 | 4 | Tejun Heo	<teheo@suse.de> | 
 | 5 |  | 
 | 6 | First draft	10 January 2007 | 
 | 7 |  | 
 | 8 |  | 
 | 9 | 1. Intro			: Huh? Devres? | 
 | 10 | 2. Devres			: Devres in a nutshell | 
 | 11 | 3. Devres Group			: Group devres'es and release them together | 
 | 12 | 4. Details			: Life time rules, calling context, ... | 
 | 13 | 5. Overhead			: How much do we have to pay for this? | 
 | 14 | 6. List of managed interfaces	: Currently implemented managed interfaces | 
 | 15 |  | 
 | 16 |  | 
 | 17 |   1. Intro | 
 | 18 |   -------- | 
 | 19 |  | 
 | 20 | devres came up while trying to convert libata to use iomap.  Each | 
 | 21 | iomapped address should be kept and unmapped on driver detach.  For | 
 | 22 | example, a plain SFF ATA controller (that is, good old PCI IDE) in | 
 | 23 | native mode makes use of 5 PCI BARs and all of them should be | 
 | 24 | maintained. | 
 | 25 |  | 
 | 26 | As with many other device drivers, libata low level drivers have | 
 | 27 | sufficient bugs in ->remove and ->probe failure path.  Well, yes, | 
 | 28 | that's probably because libata low level driver developers are lazy | 
 | 29 | bunch, but aren't all low level driver developers?  After spending a | 
 | 30 | day fiddling with braindamaged hardware with no document or | 
 | 31 | braindamaged document, if it's finally working, well, it's working. | 
 | 32 |  | 
 | 33 | For one reason or another, low level drivers don't receive as much | 
 | 34 | attention or testing as core code, and bugs on driver detach or | 
| Matt LaPlante | 01dd2fb | 2007-10-20 01:34:40 +0200 | [diff] [blame] | 35 | initialization failure don't happen often enough to be noticeable. | 
| Tejun Heo | 9ac7849 | 2007-01-20 16:00:26 +0900 | [diff] [blame] | 36 | Init failure path is worse because it's much less travelled while | 
 | 37 | needs to handle multiple entry points. | 
 | 38 |  | 
 | 39 | So, many low level drivers end up leaking resources on driver detach | 
 | 40 | and having half broken failure path implementation in ->probe() which | 
 | 41 | would leak resources or even cause oops when failure occurs.  iomap | 
 | 42 | adds more to this mix.  So do msi and msix. | 
 | 43 |  | 
 | 44 |  | 
 | 45 |   2. Devres | 
 | 46 |   --------- | 
 | 47 |  | 
 | 48 | devres is basically linked list of arbitrarily sized memory areas | 
 | 49 | associated with a struct device.  Each devres entry is associated with | 
 | 50 | a release function.  A devres can be released in several ways.  No | 
 | 51 | matter what, all devres entries are released on driver detach.  On | 
 | 52 | release, the associated release function is invoked and then the | 
 | 53 | devres entry is freed. | 
 | 54 |  | 
 | 55 | Managed interface is created for resources commonly used by device | 
 | 56 | drivers using devres.  For example, coherent DMA memory is acquired | 
 | 57 | using dma_alloc_coherent().  The managed version is called | 
 | 58 | dmam_alloc_coherent().  It is identical to dma_alloc_coherent() except | 
 | 59 | for the DMA memory allocated using it is managed and will be | 
 | 60 | automatically released on driver detach.  Implementation looks like | 
 | 61 | the following. | 
 | 62 |  | 
 | 63 |   struct dma_devres { | 
 | 64 | 	size_t		size; | 
 | 65 | 	void		*vaddr; | 
 | 66 | 	dma_addr_t	dma_handle; | 
 | 67 |   }; | 
 | 68 |  | 
 | 69 |   static void dmam_coherent_release(struct device *dev, void *res) | 
 | 70 |   { | 
 | 71 | 	struct dma_devres *this = res; | 
 | 72 |  | 
 | 73 | 	dma_free_coherent(dev, this->size, this->vaddr, this->dma_handle); | 
 | 74 |   } | 
 | 75 |  | 
 | 76 |   dmam_alloc_coherent(dev, size, dma_handle, gfp) | 
 | 77 |   { | 
 | 78 | 	struct dma_devres *dr; | 
 | 79 | 	void *vaddr; | 
 | 80 |  | 
 | 81 | 	dr = devres_alloc(dmam_coherent_release, sizeof(*dr), gfp); | 
 | 82 | 	... | 
 | 83 |  | 
 | 84 | 	/* alloc DMA memory as usual */ | 
 | 85 | 	vaddr = dma_alloc_coherent(...); | 
 | 86 | 	... | 
 | 87 |  | 
 | 88 | 	/* record size, vaddr, dma_handle in dr */ | 
 | 89 | 	dr->vaddr = vaddr; | 
 | 90 | 	... | 
 | 91 |  | 
 | 92 | 	devres_add(dev, dr); | 
 | 93 |  | 
 | 94 | 	return vaddr; | 
 | 95 |   } | 
 | 96 |  | 
 | 97 | If a driver uses dmam_alloc_coherent(), the area is guaranteed to be | 
 | 98 | freed whether initialization fails half-way or the device gets | 
 | 99 | detached.  If most resources are acquired using managed interface, a | 
 | 100 | driver can have much simpler init and exit code.  Init path basically | 
 | 101 | looks like the following. | 
 | 102 |  | 
 | 103 |   my_init_one() | 
 | 104 |   { | 
 | 105 | 	struct mydev *d; | 
 | 106 |  | 
 | 107 | 	d = devm_kzalloc(dev, sizeof(*d), GFP_KERNEL); | 
 | 108 | 	if (!d) | 
 | 109 | 		return -ENOMEM; | 
 | 110 |  | 
 | 111 | 	d->ring = dmam_alloc_coherent(...); | 
 | 112 | 	if (!d->ring) | 
 | 113 | 		return -ENOMEM; | 
 | 114 |  | 
 | 115 | 	if (check something) | 
 | 116 | 		return -EINVAL; | 
 | 117 | 	... | 
 | 118 |  | 
 | 119 | 	return register_to_upper_layer(d); | 
 | 120 |   } | 
 | 121 |  | 
 | 122 | And exit path, | 
 | 123 |  | 
 | 124 |   my_remove_one() | 
 | 125 |   { | 
 | 126 | 	unregister_from_upper_layer(d); | 
 | 127 | 	shutdown_my_hardware(); | 
 | 128 |   } | 
 | 129 |  | 
 | 130 | As shown above, low level drivers can be simplified a lot by using | 
 | 131 | devres.  Complexity is shifted from less maintained low level drivers | 
 | 132 | to better maintained higher layer.  Also, as init failure path is | 
 | 133 | shared with exit path, both can get more testing. | 
 | 134 |  | 
 | 135 |  | 
 | 136 |   3. Devres group | 
 | 137 |   --------------- | 
 | 138 |  | 
 | 139 | Devres entries can be grouped using devres group.  When a group is | 
 | 140 | released, all contained normal devres entries and properly nested | 
 | 141 | groups are released.  One usage is to rollback series of acquired | 
 | 142 | resources on failure.  For example, | 
 | 143 |  | 
 | 144 |   if (!devres_open_group(dev, NULL, GFP_KERNEL)) | 
 | 145 | 	return -ENOMEM; | 
 | 146 |  | 
 | 147 |   acquire A; | 
 | 148 |   if (failed) | 
 | 149 | 	goto err; | 
 | 150 |  | 
 | 151 |   acquire B; | 
 | 152 |   if (failed) | 
 | 153 | 	goto err; | 
 | 154 |   ... | 
 | 155 |  | 
 | 156 |   devres_remove_group(dev, NULL); | 
 | 157 |   return 0; | 
 | 158 |  | 
 | 159 |  err: | 
 | 160 |   devres_release_group(dev, NULL); | 
 | 161 |   return err_code; | 
 | 162 |  | 
| Matt LaPlante | 01dd2fb | 2007-10-20 01:34:40 +0200 | [diff] [blame] | 163 | As resource acquisition failure usually means probe failure, constructs | 
| Tejun Heo | 9ac7849 | 2007-01-20 16:00:26 +0900 | [diff] [blame] | 164 | like above are usually useful in midlayer driver (e.g. libata core | 
 | 165 | layer) where interface function shouldn't have side effect on failure. | 
 | 166 | For LLDs, just returning error code suffices in most cases. | 
 | 167 |  | 
 | 168 | Each group is identified by void *id.  It can either be explicitly | 
 | 169 | specified by @id argument to devres_open_group() or automatically | 
 | 170 | created by passing NULL as @id as in the above example.  In both | 
 | 171 | cases, devres_open_group() returns the group's id.  The returned id | 
 | 172 | can be passed to other devres functions to select the target group. | 
 | 173 | If NULL is given to those functions, the latest open group is | 
 | 174 | selected. | 
 | 175 |  | 
 | 176 | For example, you can do something like the following. | 
 | 177 |  | 
 | 178 |   int my_midlayer_create_something() | 
 | 179 |   { | 
 | 180 | 	if (!devres_open_group(dev, my_midlayer_create_something, GFP_KERNEL)) | 
 | 181 | 		return -ENOMEM; | 
 | 182 |  | 
 | 183 | 	... | 
 | 184 |  | 
| Rolf Eike Beer | 3265b54 | 2007-05-01 11:00:19 +0200 | [diff] [blame] | 185 | 	devres_close_group(dev, my_midlayer_create_something); | 
| Tejun Heo | 9ac7849 | 2007-01-20 16:00:26 +0900 | [diff] [blame] | 186 | 	return 0; | 
 | 187 |   } | 
 | 188 |  | 
 | 189 |   void my_midlayer_destroy_something() | 
 | 190 |   { | 
| Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 191 | 	devres_release_group(dev, my_midlayer_create_something); | 
| Tejun Heo | 9ac7849 | 2007-01-20 16:00:26 +0900 | [diff] [blame] | 192 |   } | 
 | 193 |  | 
 | 194 |  | 
 | 195 |   4. Details | 
 | 196 |   ---------- | 
 | 197 |  | 
 | 198 | Lifetime of a devres entry begins on devres allocation and finishes | 
 | 199 | when it is released or destroyed (removed and freed) - no reference | 
 | 200 | counting. | 
 | 201 |  | 
 | 202 | devres core guarantees atomicity to all basic devres operations and | 
 | 203 | has support for single-instance devres types (atomic | 
 | 204 | lookup-and-add-if-not-found).  Other than that, synchronizing | 
 | 205 | concurrent accesses to allocated devres data is caller's | 
 | 206 | responsibility.  This is usually non-issue because bus ops and | 
 | 207 | resource allocations already do the job. | 
 | 208 |  | 
 | 209 | For an example of single-instance devres type, read pcim_iomap_table() | 
| Brandon Philips | 2c19c49 | 2007-07-17 22:09:34 -0700 | [diff] [blame] | 210 | in lib/devres.c. | 
| Tejun Heo | 9ac7849 | 2007-01-20 16:00:26 +0900 | [diff] [blame] | 211 |  | 
 | 212 | All devres interface functions can be called without context if the | 
 | 213 | right gfp mask is given. | 
 | 214 |  | 
 | 215 |  | 
 | 216 |   5. Overhead | 
 | 217 |   ----------- | 
 | 218 |  | 
 | 219 | Each devres bookkeeping info is allocated together with requested data | 
 | 220 | area.  With debug option turned off, bookkeeping info occupies 16 | 
 | 221 | bytes on 32bit machines and 24 bytes on 64bit (three pointers rounded | 
 | 222 | up to ull alignment).  If singly linked list is used, it can be | 
 | 223 | reduced to two pointers (8 bytes on 32bit, 16 bytes on 64bit). | 
 | 224 |  | 
 | 225 | Each devres group occupies 8 pointers.  It can be reduced to 6 if | 
 | 226 | singly linked list is used. | 
 | 227 |  | 
 | 228 | Memory space overhead on ahci controller with two ports is between 300 | 
 | 229 | and 400 bytes on 32bit machine after naive conversion (we can | 
 | 230 | certainly invest a bit more effort into libata core layer). | 
 | 231 |  | 
 | 232 |  | 
 | 233 |   6. List of managed interfaces | 
 | 234 |   ----------------------------- | 
 | 235 |  | 
 | 236 | IO region | 
 | 237 |   devm_request_region() | 
 | 238 |   devm_request_mem_region() | 
 | 239 |   devm_release_region() | 
 | 240 |   devm_release_mem_region() | 
 | 241 |  | 
 | 242 | IRQ | 
 | 243 |   devm_request_irq() | 
 | 244 |   devm_free_irq() | 
 | 245 |  | 
 | 246 | DMA | 
 | 247 |   dmam_alloc_coherent() | 
 | 248 |   dmam_free_coherent() | 
 | 249 |   dmam_alloc_noncoherent() | 
 | 250 |   dmam_free_noncoherent() | 
 | 251 |   dmam_declare_coherent_memory() | 
 | 252 |   dmam_pool_create() | 
 | 253 |   dmam_pool_destroy() | 
 | 254 |  | 
 | 255 | PCI | 
 | 256 |   pcim_enable_device()	: after success, all PCI ops become managed | 
 | 257 |   pcim_pin_device()	: keep PCI device enabled after release | 
 | 258 |  | 
 | 259 | IOMAP | 
 | 260 |   devm_ioport_map() | 
 | 261 |   devm_ioport_unmap() | 
 | 262 |   devm_ioremap() | 
 | 263 |   devm_ioremap_nocache() | 
 | 264 |   devm_iounmap() | 
 | 265 |   pcim_iomap() | 
 | 266 |   pcim_iounmap() | 
 | 267 |   pcim_iomap_table()	: array of mapped addresses indexed by BAR | 
 | 268 |   pcim_iomap_regions()	: do request_region() and iomap() on multiple BARs |