| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 1 | 			=============================== | 
 | 2 | 			FS-CACHE NETWORK FILESYSTEM API | 
 | 3 | 			=============================== | 
 | 4 |  | 
 | 5 | There's an API by which a network filesystem can make use of the FS-Cache | 
 | 6 | facilities.  This is based around a number of principles: | 
 | 7 |  | 
 | 8 |  (1) Caches can store a number of different object types.  There are two main | 
 | 9 |      object types: indices and files.  The first is a special type used by | 
 | 10 |      FS-Cache to make finding objects faster and to make retiring of groups of | 
 | 11 |      objects easier. | 
 | 12 |  | 
 | 13 |  (2) Every index, file or other object is represented by a cookie.  This cookie | 
 | 14 |      may or may not have anything associated with it, but the netfs doesn't | 
 | 15 |      need to care. | 
 | 16 |  | 
 | 17 |  (3) Barring the top-level index (one entry per cached netfs), the index | 
 | 18 |      hierarchy for each netfs is structured according the whim of the netfs. | 
 | 19 |  | 
 | 20 | This API is declared in <linux/fscache.h>. | 
 | 21 |  | 
 | 22 | This document contains the following sections: | 
 | 23 |  | 
 | 24 | 	 (1) Network filesystem definition | 
 | 25 | 	 (2) Index definition | 
 | 26 | 	 (3) Object definition | 
 | 27 | 	 (4) Network filesystem (un)registration | 
 | 28 | 	 (5) Cache tag lookup | 
 | 29 | 	 (6) Index registration | 
 | 30 | 	 (7) Data file registration | 
 | 31 | 	 (8) Miscellaneous object registration | 
 | 32 | 	 (9) Setting the data file size | 
 | 33 | 	(10) Page alloc/read/write | 
 | 34 | 	(11) Page uncaching | 
 | 35 | 	(12) Index and data file update | 
 | 36 | 	(13) Miscellaneous cookie operations | 
 | 37 | 	(14) Cookie unregistration | 
 | 38 | 	(15) Index and data file invalidation | 
 | 39 | 	(16) FS-Cache specific page flags. | 
 | 40 |  | 
 | 41 |  | 
 | 42 | ============================= | 
 | 43 | NETWORK FILESYSTEM DEFINITION | 
 | 44 | ============================= | 
 | 45 |  | 
 | 46 | FS-Cache needs a description of the network filesystem.  This is specified | 
 | 47 | using a record of the following structure: | 
 | 48 |  | 
 | 49 | 	struct fscache_netfs { | 
 | 50 | 		uint32_t			version; | 
 | 51 | 		const char			*name; | 
 | 52 | 		struct fscache_cookie		*primary_index; | 
 | 53 | 		... | 
 | 54 | 	}; | 
 | 55 |  | 
 | 56 | This first two fields should be filled in before registration, and the third | 
 | 57 | will be filled in by the registration function; any other fields should just be | 
 | 58 | ignored and are for internal use only. | 
 | 59 |  | 
 | 60 | The fields are: | 
 | 61 |  | 
 | 62 |  (1) The name of the netfs (used as the key in the toplevel index). | 
 | 63 |  | 
 | 64 |  (2) The version of the netfs (if the name matches but the version doesn't, the | 
 | 65 |      entire in-cache hierarchy for this netfs will be scrapped and begun | 
 | 66 |      afresh). | 
 | 67 |  | 
 | 68 |  (3) The cookie representing the primary index will be allocated according to | 
 | 69 |      another parameter passed into the registration function. | 
 | 70 |  | 
 | 71 | For example, kAFS (linux/fs/afs/) uses the following definitions to describe | 
 | 72 | itself: | 
 | 73 |  | 
 | 74 | 	struct fscache_netfs afs_cache_netfs = { | 
 | 75 | 		.version	= 0, | 
 | 76 | 		.name		= "afs", | 
 | 77 | 	}; | 
 | 78 |  | 
 | 79 |  | 
 | 80 | ================ | 
 | 81 | INDEX DEFINITION | 
 | 82 | ================ | 
 | 83 |  | 
 | 84 | Indices are used for two purposes: | 
 | 85 |  | 
 | 86 |  (1) To aid the finding of a file based on a series of keys (such as AFS's | 
 | 87 |      "cell", "volume ID", "vnode ID"). | 
 | 88 |  | 
 | 89 |  (2) To make it easier to discard a subset of all the files cached based around | 
 | 90 |      a particular key - for instance to mirror the removal of an AFS volume. | 
 | 91 |  | 
 | 92 | However, since it's unlikely that any two netfs's are going to want to define | 
 | 93 | their index hierarchies in quite the same way, FS-Cache tries to impose as few | 
 | 94 | restraints as possible on how an index is structured and where it is placed in | 
 | 95 | the tree.  The netfs can even mix indices and data files at the same level, but | 
 | 96 | it's not recommended. | 
 | 97 |  | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 98 | Each index entry consists of a key of indeterminate length plus some auxiliary | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 99 | data, also of indeterminate length. | 
 | 100 |  | 
 | 101 | There are some limits on indices: | 
 | 102 |  | 
 | 103 |  (1) Any index containing non-index objects should be restricted to a single | 
 | 104 |      cache.  Any such objects created within an index will be created in the | 
 | 105 |      first cache only.  The cache in which an index is created can be | 
 | 106 |      controlled by cache tags (see below). | 
 | 107 |  | 
 | 108 |  (2) The entry data must be atomically journallable, so it is limited to about | 
 | 109 |      400 bytes at present.  At least 400 bytes will be available. | 
 | 110 |  | 
 | 111 |  (3) The depth of the index tree should be judged with care as the search | 
 | 112 |      function is recursive.  Too many layers will run the kernel out of stack. | 
 | 113 |  | 
 | 114 |  | 
 | 115 | ================= | 
 | 116 | OBJECT DEFINITION | 
 | 117 | ================= | 
 | 118 |  | 
 | 119 | To define an object, a structure of the following type should be filled out: | 
 | 120 |  | 
 | 121 | 	struct fscache_cookie_def | 
 | 122 | 	{ | 
 | 123 | 		uint8_t name[16]; | 
 | 124 | 		uint8_t type; | 
 | 125 |  | 
 | 126 | 		struct fscache_cache_tag *(*select_cache)( | 
 | 127 | 			const void *parent_netfs_data, | 
 | 128 | 			const void *cookie_netfs_data); | 
 | 129 |  | 
 | 130 | 		uint16_t (*get_key)(const void *cookie_netfs_data, | 
 | 131 | 				    void *buffer, | 
 | 132 | 				    uint16_t bufmax); | 
 | 133 |  | 
 | 134 | 		void (*get_attr)(const void *cookie_netfs_data, | 
 | 135 | 				 uint64_t *size); | 
 | 136 |  | 
 | 137 | 		uint16_t (*get_aux)(const void *cookie_netfs_data, | 
 | 138 | 				    void *buffer, | 
 | 139 | 				    uint16_t bufmax); | 
 | 140 |  | 
 | 141 | 		enum fscache_checkaux (*check_aux)(void *cookie_netfs_data, | 
 | 142 | 						   const void *data, | 
 | 143 | 						   uint16_t datalen); | 
 | 144 |  | 
 | 145 | 		void (*get_context)(void *cookie_netfs_data, void *context); | 
 | 146 |  | 
 | 147 | 		void (*put_context)(void *cookie_netfs_data, void *context); | 
 | 148 |  | 
 | 149 | 		void (*mark_pages_cached)(void *cookie_netfs_data, | 
 | 150 | 					  struct address_space *mapping, | 
 | 151 | 					  struct pagevec *cached_pvec); | 
 | 152 |  | 
 | 153 | 		void (*now_uncached)(void *cookie_netfs_data); | 
 | 154 | 	}; | 
 | 155 |  | 
 | 156 | This has the following fields: | 
 | 157 |  | 
 | 158 |  (1) The type of the object [mandatory]. | 
 | 159 |  | 
 | 160 |      This is one of the following values: | 
 | 161 |  | 
 | 162 | 	(*) FSCACHE_COOKIE_TYPE_INDEX | 
 | 163 |  | 
 | 164 | 	    This defines an index, which is a special FS-Cache type. | 
 | 165 |  | 
 | 166 | 	(*) FSCACHE_COOKIE_TYPE_DATAFILE | 
 | 167 |  | 
 | 168 | 	    This defines an ordinary data file. | 
 | 169 |  | 
 | 170 | 	(*) Any other value between 2 and 255 | 
 | 171 |  | 
 | 172 | 	    This defines an extraordinary object such as an XATTR. | 
 | 173 |  | 
 | 174 |  (2) The name of the object type (NUL terminated unless all 16 chars are used) | 
 | 175 |      [optional]. | 
 | 176 |  | 
 | 177 |  (3) A function to select the cache in which to store an index [optional]. | 
 | 178 |  | 
 | 179 |      This function is invoked when an index needs to be instantiated in a cache | 
 | 180 |      during the instantiation of a non-index object.  Only the immediate index | 
 | 181 |      parent for the non-index object will be queried.  Any indices above that | 
 | 182 |      in the hierarchy may be stored in multiple caches.  This function does not | 
 | 183 |      need to be supplied for any non-index object or any index that will only | 
 | 184 |      have index children. | 
 | 185 |  | 
 | 186 |      If this function is not supplied or if it returns NULL then the first | 
| Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 187 |      cache in the parent's list will be chosen, or failing that, the first | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 188 |      cache in the master list. | 
 | 189 |  | 
 | 190 |  (4) A function to retrieve an object's key from the netfs [mandatory]. | 
 | 191 |  | 
 | 192 |      This function will be called with the netfs data that was passed to the | 
 | 193 |      cookie acquisition function and the maximum length of key data that it may | 
 | 194 |      provide.  It should write the required key data into the given buffer and | 
 | 195 |      return the quantity it wrote. | 
 | 196 |  | 
 | 197 |  (5) A function to retrieve attribute data from the netfs [optional]. | 
 | 198 |  | 
 | 199 |      This function will be called with the netfs data that was passed to the | 
 | 200 |      cookie acquisition function.  It should return the size of the file if | 
 | 201 |      this is a data file.  The size may be used to govern how much cache must | 
 | 202 |      be reserved for this file in the cache. | 
 | 203 |  | 
 | 204 |      If the function is absent, a file size of 0 is assumed. | 
 | 205 |  | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 206 |  (6) A function to retrieve auxiliary data from the netfs [optional]. | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 207 |  | 
 | 208 |      This function will be called with the netfs data that was passed to the | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 209 |      cookie acquisition function and the maximum length of auxiliary data that | 
 | 210 |      it may provide.  It should write the auxiliary data into the given buffer | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 211 |      and return the quantity it wrote. | 
 | 212 |  | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 213 |      If this function is absent, the auxiliary data length will be set to 0. | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 214 |  | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 215 |      The length of the auxiliary data buffer may be dependent on the key | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 216 |      length.  A netfs mustn't rely on being able to provide more than 400 bytes | 
 | 217 |      for both. | 
 | 218 |  | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 219 |  (7) A function to check the auxiliary data [optional]. | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 220 |  | 
 | 221 |      This function will be called to check that a match found in the cache for | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 222 |      this object is valid.  For instance with AFS it could check the auxiliary | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 223 |      data against the data version number returned by the server to determine | 
 | 224 |      whether the index entry in a cache is still valid. | 
 | 225 |  | 
 | 226 |      If this function is absent, it will be assumed that matching objects in a | 
 | 227 |      cache are always valid. | 
 | 228 |  | 
 | 229 |      If present, the function should return one of the following values: | 
 | 230 |  | 
 | 231 | 	(*) FSCACHE_CHECKAUX_OKAY		- the entry is okay as is | 
 | 232 | 	(*) FSCACHE_CHECKAUX_NEEDS_UPDATE	- the entry requires update | 
 | 233 | 	(*) FSCACHE_CHECKAUX_OBSOLETE		- the entry should be deleted | 
 | 234 |  | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 235 |      This function can also be used to extract data from the auxiliary data in | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 236 |      the cache and copy it into the netfs's structures. | 
 | 237 |  | 
 | 238 |  (8) A pair of functions to manage contexts for the completion callback | 
 | 239 |      [optional]. | 
 | 240 |  | 
 | 241 |      The cache read/write functions are passed a context which is then passed | 
 | 242 |      to the I/O completion callback function.  To ensure this context remains | 
 | 243 |      valid until after the I/O completion is called, two functions may be | 
 | 244 |      provided: one to get an extra reference on the context, and one to drop a | 
 | 245 |      reference to it. | 
 | 246 |  | 
 | 247 |      If the context is not used or is a type of object that won't go out of | 
 | 248 |      scope, then these functions are not required.  These functions are not | 
 | 249 |      required for indices as indices may not contain data.  These functions may | 
 | 250 |      be called in interrupt context and so may not sleep. | 
 | 251 |  | 
 | 252 |  (9) A function to mark a page as retaining cache metadata [optional]. | 
 | 253 |  | 
 | 254 |      This is called by the cache to indicate that it is retaining in-memory | 
 | 255 |      information for this page and that the netfs should uncache the page when | 
 | 256 |      it has finished.  This does not indicate whether there's data on the disk | 
 | 257 |      or not.  Note that several pages at once may be presented for marking. | 
 | 258 |  | 
 | 259 |      The PG_fscache bit is set on the pages before this function would be | 
 | 260 |      called, so the function need not be provided if this is sufficient. | 
 | 261 |  | 
 | 262 |      This function is not required for indices as they're not permitted data. | 
 | 263 |  | 
 | 264 | (10) A function to unmark all the pages retaining cache metadata [mandatory]. | 
 | 265 |  | 
 | 266 |      This is called by FS-Cache to indicate that a backing store is being | 
 | 267 |      unbound from a cookie and that all the marks on the pages should be | 
 | 268 |      cleared to prevent confusion.  Note that the cache will have torn down all | 
 | 269 |      its tracking information so that the pages don't need to be explicitly | 
 | 270 |      uncached. | 
 | 271 |  | 
 | 272 |      This function is not required for indices as they're not permitted data. | 
 | 273 |  | 
 | 274 |  | 
 | 275 | =================================== | 
 | 276 | NETWORK FILESYSTEM (UN)REGISTRATION | 
 | 277 | =================================== | 
 | 278 |  | 
 | 279 | The first step is to declare the network filesystem to the cache.  This also | 
 | 280 | involves specifying the layout of the primary index (for AFS, this would be the | 
 | 281 | "cell" level). | 
 | 282 |  | 
 | 283 | The registration function is: | 
 | 284 |  | 
 | 285 | 	int fscache_register_netfs(struct fscache_netfs *netfs); | 
 | 286 |  | 
 | 287 | It just takes a pointer to the netfs definition.  It returns 0 or an error as | 
 | 288 | appropriate. | 
 | 289 |  | 
 | 290 | For kAFS, registration is done as follows: | 
 | 291 |  | 
 | 292 | 	ret = fscache_register_netfs(&afs_cache_netfs); | 
 | 293 |  | 
 | 294 | The last step is, of course, unregistration: | 
 | 295 |  | 
 | 296 | 	void fscache_unregister_netfs(struct fscache_netfs *netfs); | 
 | 297 |  | 
 | 298 |  | 
 | 299 | ================ | 
 | 300 | CACHE TAG LOOKUP | 
 | 301 | ================ | 
 | 302 |  | 
 | 303 | FS-Cache permits the use of more than one cache.  To permit particular index | 
 | 304 | subtrees to be bound to particular caches, the second step is to look up cache | 
 | 305 | representation tags.  This step is optional; it can be left entirely up to | 
 | 306 | FS-Cache as to which cache should be used.  The problem with doing that is that | 
 | 307 | FS-Cache will always pick the first cache that was registered. | 
 | 308 |  | 
 | 309 | To get the representation for a named tag: | 
 | 310 |  | 
 | 311 | 	struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name); | 
 | 312 |  | 
 | 313 | This takes a text string as the name and returns a representation of a tag.  It | 
 | 314 | will never return an error.  It may return a dummy tag, however, if it runs out | 
 | 315 | of memory; this will inhibit caching with this tag. | 
 | 316 |  | 
 | 317 | Any representation so obtained must be released by passing it to this function: | 
 | 318 |  | 
 | 319 | 	void fscache_release_cache_tag(struct fscache_cache_tag *tag); | 
 | 320 |  | 
 | 321 | The tag will be retrieved by FS-Cache when it calls the object definition | 
 | 322 | operation select_cache(). | 
 | 323 |  | 
 | 324 |  | 
 | 325 | ================== | 
 | 326 | INDEX REGISTRATION | 
 | 327 | ================== | 
 | 328 |  | 
 | 329 | The third step is to inform FS-Cache about part of an index hierarchy that can | 
 | 330 | be used to locate files.  This is done by requesting a cookie for each index in | 
 | 331 | the path to the file: | 
 | 332 |  | 
 | 333 | 	struct fscache_cookie * | 
 | 334 | 	fscache_acquire_cookie(struct fscache_cookie *parent, | 
 | 335 | 			       const struct fscache_object_def *def, | 
 | 336 | 			       void *netfs_data); | 
 | 337 |  | 
 | 338 | This function creates an index entry in the index represented by parent, | 
 | 339 | filling in the index entry by calling the operations pointed to by def. | 
 | 340 |  | 
 | 341 | Note that this function never returns an error - all errors are handled | 
 | 342 | internally.  It may, however, return NULL to indicate no cookie.  It is quite | 
 | 343 | acceptable to pass this token back to this function as the parent to another | 
 | 344 | acquisition (or even to the relinquish cookie, read page and write page | 
 | 345 | functions - see below). | 
 | 346 |  | 
 | 347 | Note also that no indices are actually created in a cache until a non-index | 
 | 348 | object needs to be created somewhere down the hierarchy.  Furthermore, an index | 
 | 349 | may be created in several different caches independently at different times. | 
 | 350 | This is all handled transparently, and the netfs doesn't see any of it. | 
 | 351 |  | 
 | 352 | For example, with AFS, a cell would be added to the primary index.  This index | 
 | 353 | entry would have a dependent inode containing a volume location index for the | 
 | 354 | volume mappings within this cell: | 
 | 355 |  | 
 | 356 | 	cell->cache = | 
 | 357 | 		fscache_acquire_cookie(afs_cache_netfs.primary_index, | 
 | 358 | 				       &afs_cell_cache_index_def, | 
 | 359 | 				       cell); | 
 | 360 |  | 
 | 361 | Then when a volume location was accessed, it would be entered into the cell's | 
 | 362 | index and an inode would be allocated that acts as a volume type and hash chain | 
 | 363 | combination: | 
 | 364 |  | 
 | 365 | 	vlocation->cache = | 
 | 366 | 		fscache_acquire_cookie(cell->cache, | 
 | 367 | 				       &afs_vlocation_cache_index_def, | 
 | 368 | 				       vlocation); | 
 | 369 |  | 
 | 370 | And then a particular flavour of volume (R/O for example) could be added to | 
 | 371 | that index, creating another index for vnodes (AFS inode equivalents): | 
 | 372 |  | 
 | 373 | 	volume->cache = | 
 | 374 | 		fscache_acquire_cookie(vlocation->cache, | 
 | 375 | 				       &afs_volume_cache_index_def, | 
 | 376 | 				       volume); | 
 | 377 |  | 
 | 378 |  | 
 | 379 | ====================== | 
 | 380 | DATA FILE REGISTRATION | 
 | 381 | ====================== | 
 | 382 |  | 
 | 383 | The fourth step is to request a data file be created in the cache.  This is | 
 | 384 | identical to index cookie acquisition.  The only difference is that the type in | 
 | 385 | the object definition should be something other than index type. | 
 | 386 |  | 
 | 387 | 	vnode->cache = | 
 | 388 | 		fscache_acquire_cookie(volume->cache, | 
 | 389 | 				       &afs_vnode_cache_object_def, | 
 | 390 | 				       vnode); | 
 | 391 |  | 
 | 392 |  | 
 | 393 | ================================= | 
 | 394 | MISCELLANEOUS OBJECT REGISTRATION | 
 | 395 | ================================= | 
 | 396 |  | 
 | 397 | An optional step is to request an object of miscellaneous type be created in | 
 | 398 | the cache.  This is almost identical to index cookie acquisition.  The only | 
 | 399 | difference is that the type in the object definition should be something other | 
 | 400 | than index type.  Whilst the parent object could be an index, it's more likely | 
 | 401 | it would be some other type of object such as a data file. | 
 | 402 |  | 
 | 403 | 	xattr->cache = | 
 | 404 | 		fscache_acquire_cookie(vnode->cache, | 
 | 405 | 				       &afs_xattr_cache_object_def, | 
 | 406 | 				       xattr); | 
 | 407 |  | 
 | 408 | Miscellaneous objects might be used to store extended attributes or directory | 
 | 409 | entries for example. | 
 | 410 |  | 
 | 411 |  | 
 | 412 | ========================== | 
 | 413 | SETTING THE DATA FILE SIZE | 
 | 414 | ========================== | 
 | 415 |  | 
 | 416 | The fifth step is to set the physical attributes of the file, such as its size. | 
 | 417 | This doesn't automatically reserve any space in the cache, but permits the | 
 | 418 | cache to adjust its metadata for data tracking appropriately: | 
 | 419 |  | 
 | 420 | 	int fscache_attr_changed(struct fscache_cookie *cookie); | 
 | 421 |  | 
 | 422 | The cache will return -ENOBUFS if there is no backing cache or if there is no | 
 | 423 | space to allocate any extra metadata required in the cache.  The attributes | 
 | 424 | will be accessed with the get_attr() cookie definition operation. | 
 | 425 |  | 
 | 426 | Note that attempts to read or write data pages in the cache over this size may | 
 | 427 | be rebuffed with -ENOBUFS. | 
 | 428 |  | 
 | 429 | This operation schedules an attribute adjustment to happen asynchronously at | 
 | 430 | some point in the future, and as such, it may happen after the function returns | 
 | 431 | to the caller.  The attribute adjustment excludes read and write operations. | 
 | 432 |  | 
 | 433 |  | 
 | 434 | ===================== | 
 | 435 | PAGE READ/ALLOC/WRITE | 
 | 436 | ===================== | 
 | 437 |  | 
 | 438 | And the sixth step is to store and retrieve pages in the cache.  There are | 
 | 439 | three functions that are used to do this. | 
 | 440 |  | 
 | 441 | Note: | 
 | 442 |  | 
 | 443 |  (1) A page should not be re-read or re-allocated without uncaching it first. | 
 | 444 |  | 
 | 445 |  (2) A read or allocated page must be uncached when the netfs page is released | 
 | 446 |      from the pagecache. | 
 | 447 |  | 
 | 448 |  (3) A page should only be written to the cache if previous read or allocated. | 
 | 449 |  | 
 | 450 | This permits the cache to maintain its page tracking in proper order. | 
 | 451 |  | 
 | 452 |  | 
 | 453 | PAGE READ | 
 | 454 | --------- | 
 | 455 |  | 
 | 456 | Firstly, the netfs should ask FS-Cache to examine the caches and read the | 
 | 457 | contents cached for a particular page of a particular file if present, or else | 
 | 458 | allocate space to store the contents if not: | 
 | 459 |  | 
 | 460 | 	typedef | 
 | 461 | 	void (*fscache_rw_complete_t)(struct page *page, | 
 | 462 | 				      void *context, | 
 | 463 | 				      int error); | 
 | 464 |  | 
 | 465 | 	int fscache_read_or_alloc_page(struct fscache_cookie *cookie, | 
 | 466 | 				       struct page *page, | 
 | 467 | 				       fscache_rw_complete_t end_io_func, | 
 | 468 | 				       void *context, | 
 | 469 | 				       gfp_t gfp); | 
 | 470 |  | 
 | 471 | The cookie argument must specify a cookie for an object that isn't an index, | 
 | 472 | the page specified will have the data loaded into it (and is also used to | 
 | 473 | specify the page number), and the gfp argument is used to control how any | 
 | 474 | memory allocations made are satisfied. | 
 | 475 |  | 
 | 476 | If the cookie indicates the inode is not cached: | 
 | 477 |  | 
 | 478 |  (1) The function will return -ENOBUFS. | 
 | 479 |  | 
 | 480 | Else if there's a copy of the page resident in the cache: | 
 | 481 |  | 
 | 482 |  (1) The mark_pages_cached() cookie operation will be called on that page. | 
 | 483 |  | 
 | 484 |  (2) The function will submit a request to read the data from the cache's | 
 | 485 |      backing device directly into the page specified. | 
 | 486 |  | 
 | 487 |  (3) The function will return 0. | 
 | 488 |  | 
 | 489 |  (4) When the read is complete, end_io_func() will be invoked with: | 
 | 490 |  | 
 | 491 |      (*) The netfs data supplied when the cookie was created. | 
 | 492 |  | 
 | 493 |      (*) The page descriptor. | 
 | 494 |  | 
 | 495 |      (*) The context argument passed to the above function.  This will be | 
 | 496 |          maintained with the get_context/put_context functions mentioned above. | 
 | 497 |  | 
 | 498 |      (*) An argument that's 0 on success or negative for an error code. | 
 | 499 |  | 
 | 500 |      If an error occurs, it should be assumed that the page contains no usable | 
 | 501 |      data. | 
 | 502 |  | 
 | 503 |      end_io_func() will be called in process context if the read is results in | 
 | 504 |      an error, but it might be called in interrupt context if the read is | 
 | 505 |      successful. | 
 | 506 |  | 
 | 507 | Otherwise, if there's not a copy available in cache, but the cache may be able | 
 | 508 | to store the page: | 
 | 509 |  | 
 | 510 |  (1) The mark_pages_cached() cookie operation will be called on that page. | 
 | 511 |  | 
 | 512 |  (2) A block may be reserved in the cache and attached to the object at the | 
 | 513 |      appropriate place. | 
 | 514 |  | 
 | 515 |  (3) The function will return -ENODATA. | 
 | 516 |  | 
 | 517 | This function may also return -ENOMEM or -EINTR, in which case it won't have | 
 | 518 | read any data from the cache. | 
 | 519 |  | 
 | 520 |  | 
 | 521 | PAGE ALLOCATE | 
 | 522 | ------------- | 
 | 523 |  | 
 | 524 | Alternatively, if there's not expected to be any data in the cache for a page | 
 | 525 | because the file has been extended, a block can simply be allocated instead: | 
 | 526 |  | 
 | 527 | 	int fscache_alloc_page(struct fscache_cookie *cookie, | 
 | 528 | 			       struct page *page, | 
 | 529 | 			       gfp_t gfp); | 
 | 530 |  | 
 | 531 | This is similar to the fscache_read_or_alloc_page() function, except that it | 
 | 532 | never reads from the cache.  It will return 0 if a block has been allocated, | 
 | 533 | rather than -ENODATA as the other would.  One or the other must be performed | 
 | 534 | before writing to the cache. | 
 | 535 |  | 
 | 536 | The mark_pages_cached() cookie operation will be called on the page if | 
 | 537 | successful. | 
 | 538 |  | 
 | 539 |  | 
 | 540 | PAGE WRITE | 
 | 541 | ---------- | 
 | 542 |  | 
 | 543 | Secondly, if the netfs changes the contents of the page (either due to an | 
 | 544 | initial download or if a user performs a write), then the page should be | 
 | 545 | written back to the cache: | 
 | 546 |  | 
 | 547 | 	int fscache_write_page(struct fscache_cookie *cookie, | 
 | 548 | 			       struct page *page, | 
 | 549 | 			       gfp_t gfp); | 
 | 550 |  | 
 | 551 | The cookie argument must specify a data file cookie, the page specified should | 
 | 552 | contain the data to be written (and is also used to specify the page number), | 
 | 553 | and the gfp argument is used to control how any memory allocations made are | 
 | 554 | satisfied. | 
 | 555 |  | 
 | 556 | The page must have first been read or allocated successfully and must not have | 
 | 557 | been uncached before writing is performed. | 
 | 558 |  | 
 | 559 | If the cookie indicates the inode is not cached then: | 
 | 560 |  | 
 | 561 |  (1) The function will return -ENOBUFS. | 
 | 562 |  | 
 | 563 | Else if space can be allocated in the cache to hold this page: | 
 | 564 |  | 
 | 565 |  (1) PG_fscache_write will be set on the page. | 
 | 566 |  | 
 | 567 |  (2) The function will submit a request to write the data to cache's backing | 
 | 568 |      device directly from the page specified. | 
 | 569 |  | 
 | 570 |  (3) The function will return 0. | 
 | 571 |  | 
 | 572 |  (4) When the write is complete PG_fscache_write is cleared on the page and | 
 | 573 |      anyone waiting for that bit will be woken up. | 
 | 574 |  | 
 | 575 | Else if there's no space available in the cache, -ENOBUFS will be returned.  It | 
 | 576 | is also possible for the PG_fscache_write bit to be cleared when no write took | 
 | 577 | place if unforeseen circumstances arose (such as a disk error). | 
 | 578 |  | 
 | 579 | Writing takes place asynchronously. | 
 | 580 |  | 
 | 581 |  | 
 | 582 | MULTIPLE PAGE READ | 
 | 583 | ------------------ | 
 | 584 |  | 
 | 585 | A facility is provided to read several pages at once, as requested by the | 
 | 586 | readpages() address space operation: | 
 | 587 |  | 
 | 588 | 	int fscache_read_or_alloc_pages(struct fscache_cookie *cookie, | 
 | 589 | 					struct address_space *mapping, | 
 | 590 | 					struct list_head *pages, | 
 | 591 | 					int *nr_pages, | 
 | 592 | 					fscache_rw_complete_t end_io_func, | 
 | 593 | 					void *context, | 
 | 594 | 					gfp_t gfp); | 
 | 595 |  | 
 | 596 | This works in a similar way to fscache_read_or_alloc_page(), except: | 
 | 597 |  | 
 | 598 |  (1) Any page it can retrieve data for is removed from pages and nr_pages and | 
 | 599 |      dispatched for reading to the disk.  Reads of adjacent pages on disk may | 
 | 600 |      be merged for greater efficiency. | 
 | 601 |  | 
 | 602 |  (2) The mark_pages_cached() cookie operation will be called on several pages | 
 | 603 |      at once if they're being read or allocated. | 
 | 604 |  | 
 | 605 |  (3) If there was an general error, then that error will be returned. | 
 | 606 |  | 
 | 607 |      Else if some pages couldn't be allocated or read, then -ENOBUFS will be | 
 | 608 |      returned. | 
 | 609 |  | 
 | 610 |      Else if some pages couldn't be read but were allocated, then -ENODATA will | 
 | 611 |      be returned. | 
 | 612 |  | 
 | 613 |      Otherwise, if all pages had reads dispatched, then 0 will be returned, the | 
 | 614 |      list will be empty and *nr_pages will be 0. | 
 | 615 |  | 
 | 616 |  (4) end_io_func will be called once for each page being read as the reads | 
 | 617 |      complete.  It will be called in process context if error != 0, but it may | 
 | 618 |      be called in interrupt context if there is no error. | 
 | 619 |  | 
 | 620 | Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude | 
 | 621 | some of the pages being read and some being allocated.  Those pages will have | 
 | 622 | been marked appropriately and will need uncaching. | 
 | 623 |  | 
 | 624 |  | 
 | 625 | ============== | 
 | 626 | PAGE UNCACHING | 
 | 627 | ============== | 
 | 628 |  | 
 | 629 | To uncache a page, this function should be called: | 
 | 630 |  | 
 | 631 | 	void fscache_uncache_page(struct fscache_cookie *cookie, | 
 | 632 | 				  struct page *page); | 
 | 633 |  | 
 | 634 | This function permits the cache to release any in-memory representation it | 
 | 635 | might be holding for this netfs page.  This function must be called once for | 
 | 636 | each page on which the read or write page functions above have been called to | 
 | 637 | make sure the cache's in-memory tracking information gets torn down. | 
 | 638 |  | 
 | 639 | Note that pages can't be explicitly deleted from the a data file.  The whole | 
 | 640 | data file must be retired (see the relinquish cookie function below). | 
 | 641 |  | 
 | 642 | Furthermore, note that this does not cancel the asynchronous read or write | 
 | 643 | operation started by the read/alloc and write functions, so the page | 
| David Howells | 201a154 | 2009-11-19 18:11:35 +0000 | [diff] [blame] | 644 | invalidation functions must use: | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 645 |  | 
 | 646 | 	bool fscache_check_page_write(struct fscache_cookie *cookie, | 
 | 647 | 				      struct page *page); | 
 | 648 |  | 
 | 649 | to see if a page is being written to the cache, and: | 
 | 650 |  | 
 | 651 | 	void fscache_wait_on_page_write(struct fscache_cookie *cookie, | 
 | 652 | 					struct page *page); | 
 | 653 |  | 
 | 654 | to wait for it to finish if it is. | 
 | 655 |  | 
 | 656 |  | 
| David Howells | 201a154 | 2009-11-19 18:11:35 +0000 | [diff] [blame] | 657 | When releasepage() is being implemented, a special FS-Cache function exists to | 
 | 658 | manage the heuristics of coping with vmscan trying to eject pages, which may | 
 | 659 | conflict with the cache trying to write pages to the cache (which may itself | 
 | 660 | need to allocate memory): | 
 | 661 |  | 
 | 662 | 	bool fscache_maybe_release_page(struct fscache_cookie *cookie, | 
 | 663 | 					struct page *page, | 
 | 664 | 					gfp_t gfp); | 
 | 665 |  | 
 | 666 | This takes the netfs cookie, and the page and gfp arguments as supplied to | 
 | 667 | releasepage().  It will return false if the page cannot be released yet for | 
 | 668 | some reason and if it returns true, the page has been uncached and can now be | 
 | 669 | released. | 
 | 670 |  | 
 | 671 | To make a page available for release, this function may wait for an outstanding | 
 | 672 | storage request to complete, or it may attempt to cancel the storage request - | 
 | 673 | in which case the page will not be stored in the cache this time. | 
 | 674 |  | 
 | 675 |  | 
| David Howells | c902ce1 | 2011-07-07 12:19:48 +0100 | [diff] [blame] | 676 | BULK INODE PAGE UNCACHE | 
 | 677 | ----------------------- | 
 | 678 |  | 
 | 679 | A convenience routine is provided to perform an uncache on all the pages | 
 | 680 | attached to an inode.  This assumes that the pages on the inode correspond on a | 
 | 681 | 1:1 basis with the pages in the cache. | 
 | 682 |  | 
 | 683 | 	void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie, | 
 | 684 | 					     struct inode *inode); | 
 | 685 |  | 
 | 686 | This takes the netfs cookie that the pages were cached with and the inode that | 
 | 687 | the pages are attached to.  This function will wait for pages to finish being | 
 | 688 | written to the cache and for the cache to finish with the page generally.  No | 
 | 689 | error is returned. | 
 | 690 |  | 
 | 691 |  | 
| David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 692 | ========================== | 
 | 693 | INDEX AND DATA FILE UPDATE | 
 | 694 | ========================== | 
 | 695 |  | 
 | 696 | To request an update of the index data for an index or other object, the | 
 | 697 | following function should be called: | 
 | 698 |  | 
 | 699 | 	void fscache_update_cookie(struct fscache_cookie *cookie); | 
 | 700 |  | 
 | 701 | This function will refer back to the netfs_data pointer stored in the cookie by | 
 | 702 | the acquisition function to obtain the data to write into each revised index | 
 | 703 | entry.  The update method in the parent index definition will be called to | 
 | 704 | transfer the data. | 
 | 705 |  | 
 | 706 | Note that partial updates may happen automatically at other times, such as when | 
 | 707 | data blocks are added to a data file object. | 
 | 708 |  | 
 | 709 |  | 
 | 710 | =============================== | 
 | 711 | MISCELLANEOUS COOKIE OPERATIONS | 
 | 712 | =============================== | 
 | 713 |  | 
 | 714 | There are a number of operations that can be used to control cookies: | 
 | 715 |  | 
 | 716 |  (*) Cookie pinning: | 
 | 717 |  | 
 | 718 | 	int fscache_pin_cookie(struct fscache_cookie *cookie); | 
 | 719 | 	void fscache_unpin_cookie(struct fscache_cookie *cookie); | 
 | 720 |  | 
 | 721 |      These operations permit data cookies to be pinned into the cache and to | 
 | 722 |      have the pinning removed.  They are not permitted on index cookies. | 
 | 723 |  | 
 | 724 |      The pinning function will return 0 if successful, -ENOBUFS in the cookie | 
 | 725 |      isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning, | 
 | 726 |      -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or | 
 | 727 |      -EIO if there's any other problem. | 
 | 728 |  | 
 | 729 |  (*) Data space reservation: | 
 | 730 |  | 
 | 731 | 	int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size); | 
 | 732 |  | 
 | 733 |      This permits a netfs to request cache space be reserved to store up to the | 
 | 734 |      given amount of a file.  It is permitted to ask for more than the current | 
 | 735 |      size of the file to allow for future file expansion. | 
 | 736 |  | 
 | 737 |      If size is given as zero then the reservation will be cancelled. | 
 | 738 |  | 
 | 739 |      The function will return 0 if successful, -ENOBUFS in the cookie isn't | 
 | 740 |      backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations, | 
 | 741 |      -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or | 
 | 742 |      -EIO if there's any other problem. | 
 | 743 |  | 
 | 744 |      Note that this doesn't pin an object in a cache; it can still be culled to | 
 | 745 |      make space if it's not in use. | 
 | 746 |  | 
 | 747 |  | 
 | 748 | ===================== | 
 | 749 | COOKIE UNREGISTRATION | 
 | 750 | ===================== | 
 | 751 |  | 
 | 752 | To get rid of a cookie, this function should be called. | 
 | 753 |  | 
 | 754 | 	void fscache_relinquish_cookie(struct fscache_cookie *cookie, | 
 | 755 | 				       int retire); | 
 | 756 |  | 
 | 757 | If retire is non-zero, then the object will be marked for recycling, and all | 
 | 758 | copies of it will be removed from all active caches in which it is present. | 
 | 759 | Not only that but all child objects will also be retired. | 
 | 760 |  | 
 | 761 | If retire is zero, then the object may be available again when next the | 
 | 762 | acquisition function is called.  Retirement here will overrule the pinning on a | 
 | 763 | cookie. | 
 | 764 |  | 
 | 765 | One very important note - relinquish must NOT be called for a cookie unless all | 
 | 766 | the cookies for "child" indices, objects and pages have been relinquished | 
 | 767 | first. | 
 | 768 |  | 
 | 769 |  | 
 | 770 | ================================ | 
 | 771 | INDEX AND DATA FILE INVALIDATION | 
 | 772 | ================================ | 
 | 773 |  | 
 | 774 | There is no direct way to invalidate an index subtree or a data file.  To do | 
 | 775 | this, the caller should relinquish and retire the cookie they have, and then | 
 | 776 | acquire a new one. | 
 | 777 |  | 
 | 778 |  | 
 | 779 | =========================== | 
 | 780 | FS-CACHE SPECIFIC PAGE FLAG | 
 | 781 | =========================== | 
 | 782 |  | 
 | 783 | FS-Cache makes use of a page flag, PG_private_2, for its own purpose.  This is | 
 | 784 | given the alternative name PG_fscache. | 
 | 785 |  | 
 | 786 | PG_fscache is used to indicate that the page is known by the cache, and that | 
 | 787 | the cache must be informed if the page is going to go away.  It's an indication | 
 | 788 | to the netfs that the cache has an interest in this page, where an interest may | 
 | 789 | be a pointer to it, resources allocated or reserved for it, or I/O in progress | 
 | 790 | upon it. | 
 | 791 |  | 
 | 792 | The netfs can use this information in methods such as releasepage() to | 
 | 793 | determine whether it needs to uncache a page or update it. | 
 | 794 |  | 
 | 795 | Furthermore, if this bit is set, releasepage() and invalidatepage() operations | 
 | 796 | will be called on a page to get rid of it, even if PG_private is not set.  This | 
 | 797 | allows caching to attempted on a page before read_cache_pages() to be called | 
 | 798 | after fscache_read_or_alloc_pages() as the former will try and release pages it | 
 | 799 | was given under certain circumstances. | 
 | 800 |  | 
 | 801 | This bit does not overlap with such as PG_private.  This means that FS-Cache | 
 | 802 | can be used with a filesystem that uses the block buffering code. | 
 | 803 |  | 
 | 804 | There are a number of operations defined on this flag: | 
 | 805 |  | 
 | 806 | 	int PageFsCache(struct page *page); | 
 | 807 | 	void SetPageFsCache(struct page *page) | 
 | 808 | 	void ClearPageFsCache(struct page *page) | 
 | 809 | 	int TestSetPageFsCache(struct page *page) | 
 | 810 | 	int TestClearPageFsCache(struct page *page) | 
 | 811 |  | 
 | 812 | These functions are bit test, bit set, bit clear, bit test and set and bit | 
 | 813 | test and clear operations on PG_fscache. |