| David Howells | 36c9559 | 2009-04-03 16:42:38 +0100 | [diff] [blame] | 1 | ==================================================== | 
|  | 2 | IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT | 
|  | 3 | ==================================================== | 
|  | 4 |  | 
|  | 5 | By: David Howells <dhowells@redhat.com> | 
|  | 6 |  | 
|  | 7 | Contents: | 
|  | 8 |  | 
|  | 9 | (*) Representation | 
|  | 10 |  | 
|  | 11 | (*) Object management state machine. | 
|  | 12 |  | 
|  | 13 | - Provision of cpu time. | 
|  | 14 | - Locking simplification. | 
|  | 15 |  | 
|  | 16 | (*) The set of states. | 
|  | 17 |  | 
|  | 18 | (*) The set of events. | 
|  | 19 |  | 
|  | 20 |  | 
|  | 21 | ============== | 
|  | 22 | REPRESENTATION | 
|  | 23 | ============== | 
|  | 24 |  | 
|  | 25 | FS-Cache maintains an in-kernel representation of each object that a netfs is | 
|  | 26 | currently interested in.  Such objects are represented by the fscache_cookie | 
|  | 27 | struct and are referred to as cookies. | 
|  | 28 |  | 
|  | 29 | FS-Cache also maintains a separate in-kernel representation of the objects that | 
|  | 30 | a cache backend is currently actively caching.  Such objects are represented by | 
|  | 31 | the fscache_object struct.  The cache backends allocate these upon request, and | 
|  | 32 | are expected to embed them in their own representations.  These are referred to | 
|  | 33 | as objects. | 
|  | 34 |  | 
|  | 35 | There is a 1:N relationship between cookies and objects.  A cookie may be | 
|  | 36 | represented by multiple objects - an index may exist in more than one cache - | 
|  | 37 | or even by no objects (it may not be cached). | 
|  | 38 |  | 
|  | 39 | Furthermore, both cookies and objects are hierarchical.  The two hierarchies | 
|  | 40 | correspond, but the cookies tree is a superset of the union of the object trees | 
|  | 41 | of multiple caches: | 
|  | 42 |  | 
|  | 43 | NETFS INDEX TREE               :      CACHE 1     :      CACHE 2 | 
|  | 44 | :                  : | 
|  | 45 | :   +-----------+  : | 
|  | 46 | +----------->|  IObject  |  : | 
|  | 47 | +-----------+       |        :   +-----------+  : | 
|  | 48 | |  ICookie  |-------+        :         |        : | 
|  | 49 | +-----------+       |        :         |        :   +-----------+ | 
|  | 50 | |             +------------------------------>|  IObject  | | 
|  | 51 | |                      :         |        :   +-----------+ | 
|  | 52 | |                      :         V        :         | | 
|  | 53 | |                      :   +-----------+  :         | | 
|  | 54 | V             +----------->|  IObject  |  :         | | 
|  | 55 | +-----------+       |        :   +-----------+  :         | | 
|  | 56 | |  ICookie  |-------+        :         |        :         V | 
|  | 57 | +-----------+       |        :         |        :   +-----------+ | 
|  | 58 | |             +------------------------------>|  IObject  | | 
|  | 59 | +-----+-----+                :         |        :   +-----------+ | 
|  | 60 | |           |                :         |        :         | | 
|  | 61 | V           |                :         V        :         | | 
|  | 62 | +-----------+     |                :   +-----------+  :         | | 
|  | 63 | |  ICookie  |------------------------->|  IObject  |  :         | | 
|  | 64 | +-----------+     |                :   +-----------+  :         | | 
|  | 65 | |           V                :         |        :         V | 
|  | 66 | |     +-----------+          :         |        :   +-----------+ | 
|  | 67 | |     |  ICookie  |-------------------------------->|  IObject  | | 
|  | 68 | |     +-----------+          :         |        :   +-----------+ | 
|  | 69 | V           |                :         V        :         | | 
|  | 70 | +-----------+     |                :   +-----------+  :         | | 
|  | 71 | |  DCookie  |------------------------->|  DObject  |  :         | | 
|  | 72 | +-----------+     |                :   +-----------+  :         | | 
|  | 73 | |                :                  :         | | 
|  | 74 | +-------+-------+        :                  :         | | 
|  | 75 | |               |        :                  :         | | 
|  | 76 | V               V        :                  :         V | 
|  | 77 | +-----------+   +-----------+  :                  :   +-----------+ | 
|  | 78 | |  DCookie  |   |  DCookie  |------------------------>|  DObject  | | 
|  | 79 | +-----------+   +-----------+  :                  :   +-----------+ | 
|  | 80 | :                  : | 
|  | 81 |  | 
|  | 82 | In the above illustration, ICookie and IObject represent indices and DCookie | 
|  | 83 | and DObject represent data storage objects.  Indices may have representation in | 
|  | 84 | multiple caches, but currently, non-index objects may not.  Objects of any type | 
|  | 85 | may also be entirely unrepresented. | 
|  | 86 |  | 
|  | 87 | As far as the netfs API goes, the netfs is only actually permitted to see | 
|  | 88 | pointers to the cookies.  The cookies themselves and any objects attached to | 
|  | 89 | those cookies are hidden from it. | 
|  | 90 |  | 
|  | 91 |  | 
|  | 92 | =============================== | 
|  | 93 | OBJECT MANAGEMENT STATE MACHINE | 
|  | 94 | =============================== | 
|  | 95 |  | 
|  | 96 | Within FS-Cache, each active object is managed by its own individual state | 
|  | 97 | machine.  The state for an object is kept in the fscache_object struct, in | 
|  | 98 | object->state.  A cookie may point to a set of objects that are in different | 
|  | 99 | states. | 
|  | 100 |  | 
|  | 101 | Each state has an action associated with it that is invoked when the machine | 
|  | 102 | wakes up in that state.  There are four logical sets of states: | 
|  | 103 |  | 
|  | 104 | (1) Preparation: states that wait for the parent objects to become ready.  The | 
|  | 105 | representations are hierarchical, and it is expected that an object must | 
|  | 106 | be created or accessed with respect to its parent object. | 
|  | 107 |  | 
|  | 108 | (2) Initialisation: states that perform lookups in the cache and validate | 
|  | 109 | what's found and that create on disk any missing metadata. | 
|  | 110 |  | 
|  | 111 | (3) Normal running: states that allow netfs operations on objects to proceed | 
|  | 112 | and that update the state of objects. | 
|  | 113 |  | 
|  | 114 | (4) Termination: states that detach objects from their netfs cookies, that | 
|  | 115 | delete objects from disk, that handle disk and system errors and that free | 
|  | 116 | up in-memory resources. | 
|  | 117 |  | 
|  | 118 |  | 
|  | 119 | In most cases, transitioning between states is in response to signalled events. | 
|  | 120 | When a state has finished processing, it will usually set the mask of events in | 
|  | 121 | which it is interested (object->event_mask) and relinquish the worker thread. | 
|  | 122 | Then when an event is raised (by calling fscache_raise_event()), if the event | 
|  | 123 | is not masked, the object will be queued for processing (by calling | 
|  | 124 | fscache_enqueue_object()). | 
|  | 125 |  | 
|  | 126 |  | 
|  | 127 | PROVISION OF CPU TIME | 
|  | 128 | --------------------- | 
|  | 129 |  | 
|  | 130 | The work to be done by the various states is given CPU time by the threads of | 
|  | 131 | the slow work facility (see Documentation/slow-work.txt).  This is used in | 
|  | 132 | preference to the workqueue facility because: | 
|  | 133 |  | 
|  | 134 | (1) Threads may be completely occupied for very long periods of time by a | 
|  | 135 | particular work item.  These state actions may be doing sequences of | 
|  | 136 | synchronous, journalled disk accesses (lookup, mkdir, create, setxattr, | 
|  | 137 | getxattr, truncate, unlink, rmdir, rename). | 
|  | 138 |  | 
|  | 139 | (2) Threads may do little actual work, but may rather spend a lot of time | 
|  | 140 | sleeping on I/O.  This means that single-threaded and 1-per-CPU-threaded | 
|  | 141 | workqueues don't necessarily have the right numbers of threads. | 
|  | 142 |  | 
|  | 143 |  | 
|  | 144 | LOCKING SIMPLIFICATION | 
|  | 145 | ---------------------- | 
|  | 146 |  | 
|  | 147 | Because only one worker thread may be operating on any particular object's | 
|  | 148 | state machine at once, this simplifies the locking, particularly with respect | 
|  | 149 | to disconnecting the netfs's representation of a cache object (fscache_cookie) | 
|  | 150 | from the cache backend's representation (fscache_object) - which may be | 
|  | 151 | requested from either end. | 
|  | 152 |  | 
|  | 153 |  | 
|  | 154 | ================= | 
|  | 155 | THE SET OF STATES | 
|  | 156 | ================= | 
|  | 157 |  | 
|  | 158 | The object state machine has a set of states that it can be in.  There are | 
|  | 159 | preparation states in which the object sets itself up and waits for its parent | 
|  | 160 | object to transit to a state that allows access to its children: | 
|  | 161 |  | 
|  | 162 | (1) State FSCACHE_OBJECT_INIT. | 
|  | 163 |  | 
|  | 164 | Initialise the object and wait for the parent object to become active.  In | 
|  | 165 | the cache, it is expected that it will not be possible to look an object | 
|  | 166 | up from the parent object, until that parent object itself has been looked | 
|  | 167 | up. | 
|  | 168 |  | 
|  | 169 | There are initialisation states in which the object sets itself up and accesses | 
|  | 170 | disk for the object metadata: | 
|  | 171 |  | 
|  | 172 | (2) State FSCACHE_OBJECT_LOOKING_UP. | 
|  | 173 |  | 
|  | 174 | Look up the object on disk, using the parent as a starting point. | 
|  | 175 | FS-Cache expects the cache backend to probe the cache to see whether this | 
|  | 176 | object is represented there, and if it is, to see if it's valid (coherency | 
|  | 177 | management). | 
|  | 178 |  | 
|  | 179 | The cache should call fscache_object_lookup_negative() to indicate lookup | 
|  | 180 | failure for whatever reason, and should call fscache_obtained_object() to | 
|  | 181 | indicate success. | 
|  | 182 |  | 
|  | 183 | At the completion of lookup, FS-Cache will let the netfs go ahead with | 
|  | 184 | read operations, no matter whether the file is yet cached.  If not yet | 
|  | 185 | cached, read operations will be immediately rejected with ENODATA until | 
|  | 186 | the first known page is uncached - as to that point there can be no data | 
|  | 187 | to be read out of the cache for that file that isn't currently also held | 
|  | 188 | in the pagecache. | 
|  | 189 |  | 
|  | 190 | (3) State FSCACHE_OBJECT_CREATING. | 
|  | 191 |  | 
|  | 192 | Create an object on disk, using the parent as a starting point.  This | 
|  | 193 | happens if the lookup failed to find the object, or if the object's | 
|  | 194 | coherency data indicated what's on disk is out of date.  In this state, | 
|  | 195 | FS-Cache expects the cache to create | 
|  | 196 |  | 
|  | 197 | The cache should call fscache_obtained_object() if creation completes | 
|  | 198 | successfully, fscache_object_lookup_negative() otherwise. | 
|  | 199 |  | 
|  | 200 | At the completion of creation, FS-Cache will start processing write | 
|  | 201 | operations the netfs has queued for an object.  If creation failed, the | 
|  | 202 | write ops will be transparently discarded, and nothing recorded in the | 
|  | 203 | cache. | 
|  | 204 |  | 
|  | 205 | There are some normal running states in which the object spends its time | 
|  | 206 | servicing netfs requests: | 
|  | 207 |  | 
|  | 208 | (4) State FSCACHE_OBJECT_AVAILABLE. | 
|  | 209 |  | 
|  | 210 | A transient state in which pending operations are started, child objects | 
|  | 211 | are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary | 
|  | 212 | lookup data is freed. | 
|  | 213 |  | 
|  | 214 | (5) State FSCACHE_OBJECT_ACTIVE. | 
|  | 215 |  | 
|  | 216 | The normal running state.  In this state, requests the netfs makes will be | 
|  | 217 | passed on to the cache. | 
|  | 218 |  | 
|  | 219 | (6) State FSCACHE_OBJECT_UPDATING. | 
|  | 220 |  | 
|  | 221 | The state machine comes here to update the object in the cache from the | 
|  | 222 | netfs's records.  This involves updating the auxiliary data that is used | 
|  | 223 | to maintain coherency. | 
|  | 224 |  | 
|  | 225 | And there are terminal states in which an object cleans itself up, deallocates | 
|  | 226 | memory and potentially deletes stuff from disk: | 
|  | 227 |  | 
|  | 228 | (7) State FSCACHE_OBJECT_LC_DYING. | 
|  | 229 |  | 
|  | 230 | The object comes here if it is dying because of a lookup or creation | 
|  | 231 | error.  This would be due to a disk error or system error of some sort. | 
|  | 232 | Temporary data is cleaned up, and the parent is released. | 
|  | 233 |  | 
|  | 234 | (8) State FSCACHE_OBJECT_DYING. | 
|  | 235 |  | 
|  | 236 | The object comes here if it is dying due to an error, because its parent | 
|  | 237 | cookie has been relinquished by the netfs or because the cache is being | 
|  | 238 | withdrawn. | 
|  | 239 |  | 
|  | 240 | Any child objects waiting on this one are given CPU time so that they too | 
|  | 241 | can destroy themselves.  This object waits for all its children to go away | 
|  | 242 | before advancing to the next state. | 
|  | 243 |  | 
|  | 244 | (9) State FSCACHE_OBJECT_ABORT_INIT. | 
|  | 245 |  | 
|  | 246 | The object comes to this state if it was waiting on its parent in | 
|  | 247 | FSCACHE_OBJECT_INIT, but its parent died.  The object will destroy itself | 
|  | 248 | so that the parent may proceed from the FSCACHE_OBJECT_DYING state. | 
|  | 249 |  | 
|  | 250 | (10) State FSCACHE_OBJECT_RELEASING. | 
|  | 251 | (11) State FSCACHE_OBJECT_RECYCLING. | 
|  | 252 |  | 
|  | 253 | The object comes to one of these two states when dying once it is rid of | 
|  | 254 | all its children, if it is dying because the netfs relinquished its | 
|  | 255 | cookie.  In the first state, the cached data is expected to persist, and | 
|  | 256 | in the second it will be deleted. | 
|  | 257 |  | 
|  | 258 | (12) State FSCACHE_OBJECT_WITHDRAWING. | 
|  | 259 |  | 
|  | 260 | The object transits to this state if the cache decides it wants to | 
|  | 261 | withdraw the object from service, perhaps to make space, but also due to | 
|  | 262 | error or just because the whole cache is being withdrawn. | 
|  | 263 |  | 
|  | 264 | (13) State FSCACHE_OBJECT_DEAD. | 
|  | 265 |  | 
|  | 266 | The object transits to this state when the in-memory object record is | 
|  | 267 | ready to be deleted.  The object processor shouldn't ever see an object in | 
|  | 268 | this state. | 
|  | 269 |  | 
|  | 270 |  | 
|  | 271 | THE SET OF EVENTS | 
|  | 272 | ----------------- | 
|  | 273 |  | 
|  | 274 | There are a number of events that can be raised to an object state machine: | 
|  | 275 |  | 
|  | 276 | (*) FSCACHE_OBJECT_EV_UPDATE | 
|  | 277 |  | 
|  | 278 | The netfs requested that an object be updated.  The state machine will ask | 
|  | 279 | the cache backend to update the object, and the cache backend will ask the | 
|  | 280 | netfs for details of the change through its cookie definition ops. | 
|  | 281 |  | 
|  | 282 | (*) FSCACHE_OBJECT_EV_CLEARED | 
|  | 283 |  | 
|  | 284 | This is signalled in two circumstances: | 
|  | 285 |  | 
|  | 286 | (a) when an object's last child object is dropped and | 
|  | 287 |  | 
|  | 288 | (b) when the last operation outstanding on an object is completed. | 
|  | 289 |  | 
|  | 290 | This is used to proceed from the dying state. | 
|  | 291 |  | 
|  | 292 | (*) FSCACHE_OBJECT_EV_ERROR | 
|  | 293 |  | 
|  | 294 | This is signalled when an I/O error occurs during the processing of some | 
|  | 295 | object. | 
|  | 296 |  | 
|  | 297 | (*) FSCACHE_OBJECT_EV_RELEASE | 
|  | 298 | (*) FSCACHE_OBJECT_EV_RETIRE | 
|  | 299 |  | 
|  | 300 | These are signalled when the netfs relinquishes a cookie it was using. | 
|  | 301 | The event selected depends on whether the netfs asks for the backing | 
|  | 302 | object to be retired (deleted) or retained. | 
|  | 303 |  | 
|  | 304 | (*) FSCACHE_OBJECT_EV_WITHDRAW | 
|  | 305 |  | 
|  | 306 | This is signalled when the cache backend wants to withdraw an object. | 
|  | 307 | This means that the object will have to be detached from the netfs's | 
|  | 308 | cookie. | 
|  | 309 |  | 
|  | 310 | Because the withdrawing releasing/retiring events are all handled by the object | 
|  | 311 | state machine, it doesn't matter if there's a collision with both ends trying | 
|  | 312 | to sever the connection at the same time.  The state machine can just pick | 
|  | 313 | which one it wants to honour, and that effects the other. |