| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 1 | Lockless Ring Buffer Design | 
|  | 2 | =========================== | 
|  | 3 |  | 
|  | 4 | Copyright 2009 Red Hat Inc. | 
|  | 5 | Author:   Steven Rostedt <srostedt@redhat.com> | 
|  | 6 | License:   The GNU Free Documentation License, Version 1.2 | 
|  | 7 | (dual licensed under the GPL v2) | 
|  | 8 | Reviewers:   Mathieu Desnoyers, Huang Ying, Hidetoshi Seto, | 
|  | 9 | and Frederic Weisbecker. | 
|  | 10 |  | 
|  | 11 |  | 
|  | 12 | Written for: 2.6.31 | 
|  | 13 |  | 
|  | 14 | Terminology used in this Document | 
|  | 15 | --------------------------------- | 
|  | 16 |  | 
|  | 17 | tail - where new writes happen in the ring buffer. | 
|  | 18 |  | 
|  | 19 | head - where new reads happen in the ring buffer. | 
|  | 20 |  | 
|  | 21 | producer - the task that writes into the ring buffer (same as writer) | 
|  | 22 |  | 
|  | 23 | writer - same as producer | 
|  | 24 |  | 
|  | 25 | consumer - the task that reads from the buffer (same as reader) | 
|  | 26 |  | 
|  | 27 | reader - same as consumer. | 
|  | 28 |  | 
|  | 29 | reader_page - A page outside the ring buffer used solely (for the most part) | 
|  | 30 | by the reader. | 
|  | 31 |  | 
|  | 32 | head_page - a pointer to the page that the reader will use next | 
|  | 33 |  | 
|  | 34 | tail_page - a pointer to the page that will be written to next | 
|  | 35 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 36 | commit_page - a pointer to the page with the last finished non-nested write. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 37 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 38 | cmpxchg - hardware-assisted atomic transaction that performs the following: | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 39 |  | 
|  | 40 | A = B iff previous A == C | 
|  | 41 |  | 
|  | 42 | R = cmpxchg(A, C, B) is saying that we replace A with B if and only if | 
|  | 43 | current A is equal to C, and we put the old (current) A into R | 
|  | 44 |  | 
|  | 45 | R gets the previous A regardless if A is updated with B or not. | 
|  | 46 |  | 
|  | 47 | To see if the update was successful a compare of R == C may be used. | 
|  | 48 |  | 
|  | 49 | The Generic Ring Buffer | 
|  | 50 | ----------------------- | 
|  | 51 |  | 
|  | 52 | The ring buffer can be used in either an overwrite mode or in | 
|  | 53 | producer/consumer mode. | 
|  | 54 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 55 | Producer/consumer mode is where if the producer were to fill up the | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 56 | buffer before the consumer could free up anything, the producer | 
|  | 57 | will stop writing to the buffer. This will lose most recent events. | 
|  | 58 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 59 | Overwrite mode is where if the producer were to fill up the buffer | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 60 | before the consumer could free up anything, the producer will | 
|  | 61 | overwrite the older data. This will lose the oldest events. | 
|  | 62 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 63 | No two writers can write at the same time (on the same per-cpu buffer), | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 64 | but a writer may interrupt another writer, but it must finish writing | 
|  | 65 | before the previous writer may continue. This is very important to the | 
|  | 66 | algorithm. The writers act like a "stack". The way interrupts works | 
|  | 67 | enforces this behavior. | 
|  | 68 |  | 
|  | 69 |  | 
|  | 70 | writer1 start | 
|  | 71 | <preempted> writer2 start | 
|  | 72 | <preempted> writer3 start | 
|  | 73 | writer3 finishes | 
|  | 74 | writer2 finishes | 
|  | 75 | writer1 finishes | 
|  | 76 |  | 
|  | 77 | This is very much like a writer being preempted by an interrupt and | 
|  | 78 | the interrupt doing a write as well. | 
|  | 79 |  | 
|  | 80 | Readers can happen at any time. But no two readers may run at the | 
|  | 81 | same time, nor can a reader preempt/interrupt another reader. A reader | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 82 | cannot preempt/interrupt a writer, but it may read/consume from the | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 83 | buffer at the same time as a writer is writing, but the reader must be | 
|  | 84 | on another processor to do so. A reader may read on its own processor | 
|  | 85 | and can be preempted by a writer. | 
|  | 86 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 87 | A writer can preempt a reader, but a reader cannot preempt a writer. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 88 | But a reader can read the buffer at the same time (on another processor) | 
|  | 89 | as a writer. | 
|  | 90 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 91 | The ring buffer is made up of a list of pages held together by a linked list. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 92 |  | 
|  | 93 | At initialization a reader page is allocated for the reader that is not | 
|  | 94 | part of the ring buffer. | 
|  | 95 |  | 
|  | 96 | The head_page, tail_page and commit_page are all initialized to point | 
|  | 97 | to the same page. | 
|  | 98 |  | 
|  | 99 | The reader page is initialized to have its next pointer pointing to | 
|  | 100 | the head page, and its previous pointer pointing to a page before | 
|  | 101 | the head page. | 
|  | 102 |  | 
|  | 103 | The reader has its own page to use. At start up time, this page is | 
|  | 104 | allocated but is not attached to the list. When the reader wants | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 105 | to read from the buffer, if its page is empty (like it is on start-up), | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 106 | it will swap its page with the head_page. The old reader page will | 
|  | 107 | become part of the ring buffer and the head_page will be removed. | 
|  | 108 | The page after the inserted page (old reader_page) will become the | 
|  | 109 | new head page. | 
|  | 110 |  | 
|  | 111 | Once the new page is given to the reader, the reader could do what | 
|  | 112 | it wants with it, as long as a writer has left that page. | 
|  | 113 |  | 
|  | 114 | A sample of how the reader page is swapped: Note this does not | 
|  | 115 | show the head page in the buffer, it is for demonstrating a swap | 
|  | 116 | only. | 
|  | 117 |  | 
|  | 118 | +------+ | 
|  | 119 | |reader|          RING BUFFER | 
|  | 120 | |page  | | 
|  | 121 | +------+ | 
|  | 122 | +---+   +---+   +---+ | 
|  | 123 | |   |-->|   |-->|   | | 
|  | 124 | |   |<--|   |<--|   | | 
|  | 125 | +---+   +---+   +---+ | 
|  | 126 | ^ |             ^ | | 
|  | 127 | | +-------------+ | | 
|  | 128 | +-----------------+ | 
|  | 129 |  | 
|  | 130 |  | 
|  | 131 | +------+ | 
|  | 132 | |reader|          RING BUFFER | 
|  | 133 | |page  |-------------------+ | 
|  | 134 | +------+                   v | 
|  | 135 | |             +---+   +---+   +---+ | 
|  | 136 | |             |   |-->|   |-->|   | | 
|  | 137 | |             |   |<--|   |<--|   |<-+ | 
|  | 138 | |             +---+   +---+   +---+  | | 
|  | 139 | |              ^ |             ^ |   | | 
|  | 140 | |              | +-------------+ |   | | 
|  | 141 | |              +-----------------+   | | 
|  | 142 | +------------------------------------+ | 
|  | 143 |  | 
|  | 144 | +------+ | 
|  | 145 | |reader|          RING BUFFER | 
|  | 146 | |page  |-------------------+ | 
|  | 147 | +------+ <---------------+ v | 
|  | 148 | |  ^          +---+   +---+   +---+ | 
|  | 149 | |  |          |   |-->|   |-->|   | | 
|  | 150 | |  |          |   |   |   |<--|   |<-+ | 
|  | 151 | |  |          +---+   +---+   +---+  | | 
|  | 152 | |  |             |             ^ |   | | 
|  | 153 | |  |             +-------------+ |   | | 
|  | 154 | |  +-----------------------------+   | | 
|  | 155 | +------------------------------------+ | 
|  | 156 |  | 
|  | 157 | +------+ | 
|  | 158 | |buffer|          RING BUFFER | 
|  | 159 | |page  |-------------------+ | 
|  | 160 | +------+ <---------------+ v | 
|  | 161 | |  ^          +---+   +---+   +---+ | 
|  | 162 | |  |          |   |   |   |-->|   | | 
|  | 163 | |  |  New     |   |   |   |<--|   |<-+ | 
|  | 164 | |  | Reader   +---+   +---+   +---+  | | 
|  | 165 | |  |  page ----^                 |   | | 
|  | 166 | |  |                             |   | | 
|  | 167 | |  +-----------------------------+   | | 
|  | 168 | +------------------------------------+ | 
|  | 169 |  | 
|  | 170 |  | 
|  | 171 |  | 
|  | 172 | It is possible that the page swapped is the commit page and the tail page, | 
|  | 173 | if what is in the ring buffer is less than what is held in a buffer page. | 
|  | 174 |  | 
|  | 175 |  | 
|  | 176 | reader page    commit page   tail page | 
|  | 177 | |              |             | | 
|  | 178 | v              |             | | 
|  | 179 | +---+           |             | | 
|  | 180 | |   |<----------+             | | 
|  | 181 | |   |<------------------------+ | 
|  | 182 | |   |------+ | 
|  | 183 | +---+      | | 
|  | 184 | | | 
|  | 185 | v | 
|  | 186 | +---+    +---+    +---+    +---+ | 
|  | 187 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 188 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 189 | +---+    +---+    +---+    +---+ | 
|  | 190 |  | 
|  | 191 | This case is still valid for this algorithm. | 
|  | 192 | When the writer leaves the page, it simply goes into the ring buffer | 
|  | 193 | since the reader page still points to the next location in the ring | 
|  | 194 | buffer. | 
|  | 195 |  | 
|  | 196 |  | 
|  | 197 | The main pointers: | 
|  | 198 |  | 
|  | 199 | reader page - The page used solely by the reader and is not part | 
|  | 200 | of the ring buffer (may be swapped in) | 
|  | 201 |  | 
|  | 202 | head page - the next page in the ring buffer that will be swapped | 
|  | 203 | with the reader page. | 
|  | 204 |  | 
|  | 205 | tail page - the page where the next write will take place. | 
|  | 206 |  | 
|  | 207 | commit page - the page that last finished a write. | 
|  | 208 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 209 | The commit page only is updated by the outermost writer in the | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 210 | writer stack. A writer that preempts another writer will not move the | 
|  | 211 | commit page. | 
|  | 212 |  | 
|  | 213 | When data is written into the ring buffer, a position is reserved | 
|  | 214 | in the ring buffer and passed back to the writer. When the writer | 
|  | 215 | is finished writing data into that position, it commits the write. | 
|  | 216 |  | 
|  | 217 | Another write (or a read) may take place at anytime during this | 
|  | 218 | transaction. If another write happens it must finish before continuing | 
|  | 219 | with the previous write. | 
|  | 220 |  | 
|  | 221 |  | 
|  | 222 | Write reserve: | 
|  | 223 |  | 
|  | 224 | Buffer page | 
|  | 225 | +---------+ | 
|  | 226 | |written  | | 
|  | 227 | +---------+  <--- given back to writer (current commit) | 
|  | 228 | |reserved | | 
|  | 229 | +---------+ <--- tail pointer | 
|  | 230 | | empty   | | 
|  | 231 | +---------+ | 
|  | 232 |  | 
|  | 233 | Write commit: | 
|  | 234 |  | 
|  | 235 | Buffer page | 
|  | 236 | +---------+ | 
|  | 237 | |written  | | 
|  | 238 | +---------+ | 
|  | 239 | |written  | | 
| Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 240 | +---------+  <--- next position for write (current commit) | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 241 | | empty   | | 
|  | 242 | +---------+ | 
|  | 243 |  | 
|  | 244 |  | 
|  | 245 | If a write happens after the first reserve: | 
|  | 246 |  | 
|  | 247 | Buffer page | 
|  | 248 | +---------+ | 
|  | 249 | |written  | | 
|  | 250 | +---------+  <-- current commit | 
|  | 251 | |reserved | | 
|  | 252 | +---------+  <--- given back to second writer | 
|  | 253 | |reserved | | 
|  | 254 | +---------+ <--- tail pointer | 
|  | 255 |  | 
|  | 256 | After second writer commits: | 
|  | 257 |  | 
|  | 258 |  | 
|  | 259 | Buffer page | 
|  | 260 | +---------+ | 
|  | 261 | |written  | | 
|  | 262 | +---------+  <--(last full commit) | 
|  | 263 | |reserved | | 
|  | 264 | +---------+ | 
|  | 265 | |pending  | | 
|  | 266 | |commit   | | 
|  | 267 | +---------+ <--- tail pointer | 
|  | 268 |  | 
|  | 269 | When the first writer commits: | 
|  | 270 |  | 
|  | 271 | Buffer page | 
|  | 272 | +---------+ | 
|  | 273 | |written  | | 
|  | 274 | +---------+ | 
|  | 275 | |written  | | 
|  | 276 | +---------+ | 
|  | 277 | |written  | | 
|  | 278 | +---------+  <--(last full commit and tail pointer) | 
|  | 279 |  | 
|  | 280 |  | 
|  | 281 | The commit pointer points to the last write location that was | 
|  | 282 | committed without preempting another write. When a write that | 
|  | 283 | preempted another write is committed, it only becomes a pending commit | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 284 | and will not be a full commit until all writes have been committed. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 285 |  | 
|  | 286 | The commit page points to the page that has the last full commit. | 
|  | 287 | The tail page points to the page with the last write (before | 
|  | 288 | committing). | 
|  | 289 |  | 
|  | 290 | The tail page is always equal to or after the commit page. It may | 
|  | 291 | be several pages ahead. If the tail page catches up to the commit | 
|  | 292 | page then no more writes may take place (regardless of the mode | 
|  | 293 | of the ring buffer: overwrite and produce/consumer). | 
|  | 294 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 295 | The order of pages is: | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 296 |  | 
|  | 297 | head page | 
|  | 298 | commit page | 
|  | 299 | tail page | 
|  | 300 |  | 
|  | 301 | Possible scenario: | 
|  | 302 | tail page | 
|  | 303 | head page         commit page  | | 
|  | 304 | |                 |        | | 
|  | 305 | v                 v        v | 
|  | 306 | +---+    +---+    +---+    +---+ | 
|  | 307 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 308 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 309 | +---+    +---+    +---+    +---+ | 
|  | 310 |  | 
|  | 311 | There is a special case that the head page is after either the commit page | 
|  | 312 | and possibly the tail page. That is when the commit (and tail) page has been | 
|  | 313 | swapped with the reader page. This is because the head page is always | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 314 | part of the ring buffer, but the reader page is not. Whenever there | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 315 | has been less than a full page that has been committed inside the ring buffer, | 
|  | 316 | and a reader swaps out a page, it will be swapping out the commit page. | 
|  | 317 |  | 
|  | 318 |  | 
|  | 319 | reader page    commit page   tail page | 
|  | 320 | |              |             | | 
|  | 321 | v              |             | | 
|  | 322 | +---+           |             | | 
|  | 323 | |   |<----------+             | | 
|  | 324 | |   |<------------------------+ | 
|  | 325 | |   |------+ | 
|  | 326 | +---+      | | 
|  | 327 | | | 
|  | 328 | v | 
|  | 329 | +---+    +---+    +---+    +---+ | 
|  | 330 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 331 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 332 | +---+    +---+    +---+    +---+ | 
|  | 333 | ^ | 
|  | 334 | | | 
|  | 335 | head page | 
|  | 336 |  | 
|  | 337 |  | 
|  | 338 | In this case, the head page will not move when the tail and commit | 
|  | 339 | move back into the ring buffer. | 
|  | 340 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 341 | The reader cannot swap a page into the ring buffer if the commit page | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 342 | is still on that page. If the read meets the last commit (real commit | 
|  | 343 | not pending or reserved), then there is nothing more to read. | 
|  | 344 | The buffer is considered empty until another full commit finishes. | 
|  | 345 |  | 
|  | 346 | When the tail meets the head page, if the buffer is in overwrite mode, | 
|  | 347 | the head page will be pushed ahead one. If the buffer is in producer/consumer | 
|  | 348 | mode, the write will fail. | 
|  | 349 |  | 
|  | 350 | Overwrite mode: | 
|  | 351 |  | 
|  | 352 | tail page | 
|  | 353 | | | 
|  | 354 | v | 
|  | 355 | +---+    +---+    +---+    +---+ | 
|  | 356 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 357 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 358 | +---+    +---+    +---+    +---+ | 
|  | 359 | ^ | 
|  | 360 | | | 
|  | 361 | head page | 
|  | 362 |  | 
|  | 363 |  | 
|  | 364 | tail page | 
|  | 365 | | | 
|  | 366 | v | 
|  | 367 | +---+    +---+    +---+    +---+ | 
|  | 368 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 369 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 370 | +---+    +---+    +---+    +---+ | 
|  | 371 | ^ | 
|  | 372 | | | 
|  | 373 | head page | 
|  | 374 |  | 
|  | 375 |  | 
|  | 376 | tail page | 
|  | 377 | | | 
|  | 378 | v | 
|  | 379 | +---+    +---+    +---+    +---+ | 
|  | 380 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 381 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 382 | +---+    +---+    +---+    +---+ | 
|  | 383 | ^ | 
|  | 384 | | | 
|  | 385 | head page | 
|  | 386 |  | 
|  | 387 | Note, the reader page will still point to the previous head page. | 
|  | 388 | But when a swap takes place, it will use the most recent head page. | 
|  | 389 |  | 
|  | 390 |  | 
|  | 391 | Making the Ring Buffer Lockless: | 
|  | 392 | -------------------------------- | 
|  | 393 |  | 
|  | 394 | The main idea behind the lockless algorithm is to combine the moving | 
|  | 395 | of the head_page pointer with the swapping of pages with the reader. | 
|  | 396 | State flags are placed inside the pointer to the page. To do this, | 
|  | 397 | each page must be aligned in memory by 4 bytes. This will allow the 2 | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 398 | least significant bits of the address to be used as flags, since | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 399 | they will always be zero for the address. To get the address, | 
|  | 400 | simply mask out the flags. | 
|  | 401 |  | 
|  | 402 | MASK = ~3 | 
|  | 403 |  | 
|  | 404 | address & MASK | 
|  | 405 |  | 
|  | 406 | Two flags will be kept by these two bits: | 
|  | 407 |  | 
|  | 408 | HEADER - the page being pointed to is a head page | 
|  | 409 |  | 
|  | 410 | UPDATE - the page being pointed to is being updated by a writer | 
|  | 411 | and was or is about to be a head page. | 
|  | 412 |  | 
|  | 413 |  | 
|  | 414 | reader page | 
|  | 415 | | | 
|  | 416 | v | 
|  | 417 | +---+ | 
|  | 418 | |   |------+ | 
|  | 419 | +---+      | | 
|  | 420 | | | 
|  | 421 | v | 
|  | 422 | +---+    +---+    +---+    +---+ | 
|  | 423 | <---|   |--->|   |-H->|   |--->|   |---> | 
|  | 424 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 425 | +---+    +---+    +---+    +---+ | 
|  | 426 |  | 
|  | 427 |  | 
|  | 428 | The above pointer "-H->" would have the HEADER flag set. That is | 
|  | 429 | the next page is the next page to be swapped out by the reader. | 
|  | 430 | This pointer means the next page is the head page. | 
|  | 431 |  | 
|  | 432 | When the tail page meets the head pointer, it will use cmpxchg to | 
|  | 433 | change the pointer to the UPDATE state: | 
|  | 434 |  | 
|  | 435 |  | 
|  | 436 | tail page | 
|  | 437 | | | 
|  | 438 | v | 
|  | 439 | +---+    +---+    +---+    +---+ | 
|  | 440 | <---|   |--->|   |-H->|   |--->|   |---> | 
|  | 441 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 442 | +---+    +---+    +---+    +---+ | 
|  | 443 |  | 
|  | 444 | tail page | 
|  | 445 | | | 
|  | 446 | v | 
|  | 447 | +---+    +---+    +---+    +---+ | 
|  | 448 | <---|   |--->|   |-U->|   |--->|   |---> | 
|  | 449 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 450 | +---+    +---+    +---+    +---+ | 
|  | 451 |  | 
|  | 452 | "-U->" represents a pointer in the UPDATE state. | 
|  | 453 |  | 
|  | 454 | Any access to the reader will need to take some sort of lock to serialize | 
|  | 455 | the readers. But the writers will never take a lock to write to the | 
|  | 456 | ring buffer. This means we only need to worry about a single reader, | 
|  | 457 | and writes only preempt in "stack" formation. | 
|  | 458 |  | 
|  | 459 | When the reader tries to swap the page with the ring buffer, it | 
|  | 460 | will also use cmpxchg. If the flag bit in the pointer to the | 
|  | 461 | head page does not have the HEADER flag set, the compare will fail | 
|  | 462 | and the reader will need to look for the new head page and try again. | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 463 | Note, the flags UPDATE and HEADER are never set at the same time. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 464 |  | 
|  | 465 | The reader swaps the reader page as follows: | 
|  | 466 |  | 
|  | 467 | +------+ | 
|  | 468 | |reader|          RING BUFFER | 
|  | 469 | |page  | | 
|  | 470 | +------+ | 
|  | 471 | +---+    +---+    +---+ | 
|  | 472 | |   |--->|   |--->|   | | 
|  | 473 | |   |<---|   |<---|   | | 
|  | 474 | +---+    +---+    +---+ | 
|  | 475 | ^ |               ^ | | 
|  | 476 | | +---------------+ | | 
|  | 477 | +-----H-------------+ | 
|  | 478 |  | 
|  | 479 | The reader sets the reader page next pointer as HEADER to the page after | 
|  | 480 | the head page. | 
|  | 481 |  | 
|  | 482 |  | 
|  | 483 | +------+ | 
|  | 484 | |reader|          RING BUFFER | 
|  | 485 | |page  |-------H-----------+ | 
|  | 486 | +------+                   v | 
|  | 487 | |             +---+    +---+    +---+ | 
|  | 488 | |             |   |--->|   |--->|   | | 
|  | 489 | |             |   |<---|   |<---|   |<-+ | 
|  | 490 | |             +---+    +---+    +---+  | | 
|  | 491 | |              ^ |               ^ |   | | 
|  | 492 | |              | +---------------+ |   | | 
|  | 493 | |              +-----H-------------+   | | 
|  | 494 | +--------------------------------------+ | 
|  | 495 |  | 
|  | 496 | It does a cmpxchg with the pointer to the previous head page to make it | 
|  | 497 | point to the reader page. Note that the new pointer does not have the HEADER | 
|  | 498 | flag set.  This action atomically moves the head page forward. | 
|  | 499 |  | 
|  | 500 | +------+ | 
|  | 501 | |reader|          RING BUFFER | 
|  | 502 | |page  |-------H-----------+ | 
|  | 503 | +------+                   v | 
|  | 504 | |  ^          +---+   +---+   +---+ | 
|  | 505 | |  |          |   |-->|   |-->|   | | 
|  | 506 | |  |          |   |<--|   |<--|   |<-+ | 
|  | 507 | |  |          +---+   +---+   +---+  | | 
|  | 508 | |  |             |             ^ |   | | 
|  | 509 | |  |             +-------------+ |   | | 
|  | 510 | |  +-----------------------------+   | | 
|  | 511 | +------------------------------------+ | 
|  | 512 |  | 
|  | 513 | After the new head page is set, the previous pointer of the head page is | 
|  | 514 | updated to the reader page. | 
|  | 515 |  | 
|  | 516 | +------+ | 
|  | 517 | |reader|          RING BUFFER | 
|  | 518 | |page  |-------H-----------+ | 
|  | 519 | +------+ <---------------+ v | 
|  | 520 | |  ^          +---+   +---+   +---+ | 
|  | 521 | |  |          |   |-->|   |-->|   | | 
|  | 522 | |  |          |   |   |   |<--|   |<-+ | 
|  | 523 | |  |          +---+   +---+   +---+  | | 
|  | 524 | |  |             |             ^ |   | | 
|  | 525 | |  |             +-------------+ |   | | 
|  | 526 | |  +-----------------------------+   | | 
|  | 527 | +------------------------------------+ | 
|  | 528 |  | 
|  | 529 | +------+ | 
|  | 530 | |buffer|          RING BUFFER | 
|  | 531 | |page  |-------H-----------+  <--- New head page | 
|  | 532 | +------+ <---------------+ v | 
|  | 533 | |  ^          +---+   +---+   +---+ | 
|  | 534 | |  |          |   |   |   |-->|   | | 
|  | 535 | |  |  New     |   |   |   |<--|   |<-+ | 
|  | 536 | |  | Reader   +---+   +---+   +---+  | | 
|  | 537 | |  |  page ----^                 |   | | 
|  | 538 | |  |                             |   | | 
|  | 539 | |  +-----------------------------+   | | 
|  | 540 | +------------------------------------+ | 
|  | 541 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 542 | Another important point: The page that the reader page points back to | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 543 | by its previous pointer (the one that now points to the new head page) | 
|  | 544 | never points back to the reader page. That is because the reader page is | 
|  | 545 | not part of the ring buffer. Traversing the ring buffer via the next pointers | 
|  | 546 | will always stay in the ring buffer. Traversing the ring buffer via the | 
|  | 547 | prev pointers may not. | 
|  | 548 |  | 
|  | 549 | Note, the way to determine a reader page is simply by examining the previous | 
|  | 550 | pointer of the page. If the next pointer of the previous page does not | 
|  | 551 | point back to the original page, then the original page is a reader page: | 
|  | 552 |  | 
|  | 553 |  | 
|  | 554 | +--------+ | 
|  | 555 | | reader |  next   +----+ | 
|  | 556 | |  page  |-------->|    |<====== (buffer page) | 
|  | 557 | +--------+         +----+ | 
|  | 558 | |                | ^ | 
|  | 559 | |                v | next | 
|  | 560 | prev |              +----+ | 
|  | 561 | +------------->|    | | 
|  | 562 | +----+ | 
|  | 563 |  | 
|  | 564 | The way the head page moves forward: | 
|  | 565 |  | 
|  | 566 | When the tail page meets the head page and the buffer is in overwrite mode | 
|  | 567 | and more writes take place, the head page must be moved forward before the | 
|  | 568 | writer may move the tail page. The way this is done is that the writer | 
|  | 569 | performs a cmpxchg to convert the pointer to the head page from the HEADER | 
|  | 570 | flag to have the UPDATE flag set. Once this is done, the reader will | 
|  | 571 | not be able to swap the head page from the buffer, nor will it be able to | 
|  | 572 | move the head page, until the writer is finished with the move. | 
|  | 573 |  | 
|  | 574 | This eliminates any races that the reader can have on the writer. The reader | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 575 | must spin, and this is why the reader cannot preempt the writer. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 576 |  | 
|  | 577 | tail page | 
|  | 578 | | | 
|  | 579 | v | 
|  | 580 | +---+    +---+    +---+    +---+ | 
|  | 581 | <---|   |--->|   |-H->|   |--->|   |---> | 
|  | 582 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 583 | +---+    +---+    +---+    +---+ | 
|  | 584 |  | 
|  | 585 | tail page | 
|  | 586 | | | 
|  | 587 | v | 
|  | 588 | +---+    +---+    +---+    +---+ | 
|  | 589 | <---|   |--->|   |-U->|   |--->|   |---> | 
|  | 590 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 591 | +---+    +---+    +---+    +---+ | 
|  | 592 |  | 
|  | 593 | The following page will be made into the new head page. | 
|  | 594 |  | 
|  | 595 | tail page | 
|  | 596 | | | 
|  | 597 | v | 
|  | 598 | +---+    +---+    +---+    +---+ | 
|  | 599 | <---|   |--->|   |-U->|   |-H->|   |---> | 
|  | 600 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 601 | +---+    +---+    +---+    +---+ | 
|  | 602 |  | 
|  | 603 | After the new head page has been set, we can set the old head page | 
|  | 604 | pointer back to NORMAL. | 
|  | 605 |  | 
|  | 606 | tail page | 
|  | 607 | | | 
|  | 608 | v | 
|  | 609 | +---+    +---+    +---+    +---+ | 
|  | 610 | <---|   |--->|   |--->|   |-H->|   |---> | 
|  | 611 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 612 | +---+    +---+    +---+    +---+ | 
|  | 613 |  | 
|  | 614 | After the head page has been moved, the tail page may now move forward. | 
|  | 615 |  | 
|  | 616 | tail page | 
|  | 617 | | | 
|  | 618 | v | 
|  | 619 | +---+    +---+    +---+    +---+ | 
|  | 620 | <---|   |--->|   |--->|   |-H->|   |---> | 
|  | 621 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 622 | +---+    +---+    +---+    +---+ | 
|  | 623 |  | 
|  | 624 |  | 
|  | 625 | The above are the trivial updates. Now for the more complex scenarios. | 
|  | 626 |  | 
|  | 627 |  | 
|  | 628 | As stated before, if enough writes preempt the first write, the | 
|  | 629 | tail page may make it all the way around the buffer and meet the commit | 
|  | 630 | page. At this time, we must start dropping writes (usually with some kind | 
|  | 631 | of warning to the user). But what happens if the commit was still on the | 
|  | 632 | reader page? The commit page is not part of the ring buffer. The tail page | 
|  | 633 | must account for this. | 
|  | 634 |  | 
|  | 635 |  | 
|  | 636 | reader page    commit page | 
|  | 637 | |              | | 
|  | 638 | v              | | 
|  | 639 | +---+           | | 
|  | 640 | |   |<----------+ | 
|  | 641 | |   | | 
|  | 642 | |   |------+ | 
|  | 643 | +---+      | | 
|  | 644 | | | 
|  | 645 | v | 
|  | 646 | +---+    +---+    +---+    +---+ | 
|  | 647 | <---|   |--->|   |-H->|   |--->|   |---> | 
|  | 648 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 649 | +---+    +---+    +---+    +---+ | 
|  | 650 | ^ | 
|  | 651 | | | 
|  | 652 | tail page | 
|  | 653 |  | 
|  | 654 | If the tail page were to simply push the head page forward, the commit when | 
|  | 655 | leaving the reader page would not be pointing to the correct page. | 
|  | 656 |  | 
|  | 657 | The solution to this is to test if the commit page is on the reader page | 
|  | 658 | before pushing the head page. If it is, then it can be assumed that the | 
|  | 659 | tail page wrapped the buffer, and we must drop new writes. | 
|  | 660 |  | 
|  | 661 | This is not a race condition, because the commit page can only be moved | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 662 | by the outermost writer (the writer that was preempted). | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 663 | This means that the commit will not move while a writer is moving the | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 664 | tail page. The reader cannot swap the reader page if it is also being | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 665 | used as the commit page. The reader can simply check that the commit | 
|  | 666 | is off the reader page. Once the commit page leaves the reader page | 
|  | 667 | it will never go back on it unless a reader does another swap with the | 
|  | 668 | buffer page that is also the commit page. | 
|  | 669 |  | 
|  | 670 |  | 
|  | 671 | Nested writes | 
|  | 672 | ------------- | 
|  | 673 |  | 
|  | 674 | In the pushing forward of the tail page we must first push forward | 
|  | 675 | the head page if the head page is the next page. If the head page | 
|  | 676 | is not the next page, the tail page is simply updated with a cmpxchg. | 
|  | 677 |  | 
|  | 678 | Only writers move the tail page. This must be done atomically to protect | 
|  | 679 | against nested writers. | 
|  | 680 |  | 
|  | 681 | temp_page = tail_page | 
|  | 682 | next_page = temp_page->next | 
|  | 683 | cmpxchg(tail_page, temp_page, next_page) | 
|  | 684 |  | 
|  | 685 | The above will update the tail page if it is still pointing to the expected | 
|  | 686 | page. If this fails, a nested write pushed it forward, the the current write | 
|  | 687 | does not need to push it. | 
|  | 688 |  | 
|  | 689 |  | 
|  | 690 | temp page | 
|  | 691 | | | 
|  | 692 | v | 
|  | 693 | tail page | 
|  | 694 | | | 
|  | 695 | v | 
|  | 696 | +---+    +---+    +---+    +---+ | 
|  | 697 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 698 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 699 | +---+    +---+    +---+    +---+ | 
|  | 700 |  | 
|  | 701 | Nested write comes in and moves the tail page forward: | 
|  | 702 |  | 
|  | 703 | tail page (moved by nested writer) | 
|  | 704 | temp page   | | 
|  | 705 | |        | | 
|  | 706 | v        v | 
|  | 707 | +---+    +---+    +---+    +---+ | 
|  | 708 | <---|   |--->|   |--->|   |--->|   |---> | 
|  | 709 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 710 | +---+    +---+    +---+    +---+ | 
|  | 711 |  | 
|  | 712 | The above would fail the cmpxchg, but since the tail page has already | 
|  | 713 | been moved forward, the writer will just try again to reserve storage | 
|  | 714 | on the new tail page. | 
|  | 715 |  | 
|  | 716 | But the moving of the head page is a bit more complex. | 
|  | 717 |  | 
|  | 718 | tail page | 
|  | 719 | | | 
|  | 720 | v | 
|  | 721 | +---+    +---+    +---+    +---+ | 
|  | 722 | <---|   |--->|   |-H->|   |--->|   |---> | 
|  | 723 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 724 | +---+    +---+    +---+    +---+ | 
|  | 725 |  | 
|  | 726 | The write converts the head page pointer to UPDATE. | 
|  | 727 |  | 
|  | 728 | tail page | 
|  | 729 | | | 
|  | 730 | v | 
|  | 731 | +---+    +---+    +---+    +---+ | 
|  | 732 | <---|   |--->|   |-U->|   |--->|   |---> | 
|  | 733 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 734 | +---+    +---+    +---+    +---+ | 
|  | 735 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 736 | But if a nested writer preempts here, it will see that the next | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 737 | page is a head page, but it is also nested. It will detect that | 
|  | 738 | it is nested and will save that information. The detection is the | 
|  | 739 | fact that it sees the UPDATE flag instead of a HEADER or NORMAL | 
|  | 740 | pointer. | 
|  | 741 |  | 
|  | 742 | The nested writer will set the new head page pointer. | 
|  | 743 |  | 
|  | 744 | tail page | 
|  | 745 | | | 
|  | 746 | v | 
|  | 747 | +---+    +---+    +---+    +---+ | 
|  | 748 | <---|   |--->|   |-U->|   |-H->|   |---> | 
|  | 749 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 750 | +---+    +---+    +---+    +---+ | 
|  | 751 |  | 
|  | 752 | But it will not reset the update back to normal. Only the writer | 
|  | 753 | that converted a pointer from HEAD to UPDATE will convert it back | 
|  | 754 | to NORMAL. | 
|  | 755 |  | 
|  | 756 | tail page | 
|  | 757 | | | 
|  | 758 | v | 
|  | 759 | +---+    +---+    +---+    +---+ | 
|  | 760 | <---|   |--->|   |-U->|   |-H->|   |---> | 
|  | 761 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 762 | +---+    +---+    +---+    +---+ | 
|  | 763 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 764 | After the nested writer finishes, the outermost writer will convert | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 765 | the UPDATE pointer to NORMAL. | 
|  | 766 |  | 
|  | 767 |  | 
|  | 768 | tail page | 
|  | 769 | | | 
|  | 770 | v | 
|  | 771 | +---+    +---+    +---+    +---+ | 
|  | 772 | <---|   |--->|   |--->|   |-H->|   |---> | 
|  | 773 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 774 | +---+    +---+    +---+    +---+ | 
|  | 775 |  | 
|  | 776 |  | 
|  | 777 | It can be even more complex if several nested writes came in and moved | 
|  | 778 | the tail page ahead several pages: | 
|  | 779 |  | 
|  | 780 |  | 
|  | 781 | (first writer) | 
|  | 782 |  | 
|  | 783 | tail page | 
|  | 784 | | | 
|  | 785 | v | 
|  | 786 | +---+    +---+    +---+    +---+ | 
|  | 787 | <---|   |--->|   |-H->|   |--->|   |---> | 
|  | 788 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 789 | +---+    +---+    +---+    +---+ | 
|  | 790 |  | 
|  | 791 | The write converts the head page pointer to UPDATE. | 
|  | 792 |  | 
|  | 793 | tail page | 
|  | 794 | | | 
|  | 795 | v | 
|  | 796 | +---+    +---+    +---+    +---+ | 
|  | 797 | <---|   |--->|   |-U->|   |--->|   |---> | 
|  | 798 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 799 | +---+    +---+    +---+    +---+ | 
|  | 800 |  | 
|  | 801 | Next writer comes in, and sees the update and sets up the new | 
|  | 802 | head page. | 
|  | 803 |  | 
|  | 804 | (second writer) | 
|  | 805 |  | 
|  | 806 | tail page | 
|  | 807 | | | 
|  | 808 | v | 
|  | 809 | +---+    +---+    +---+    +---+ | 
|  | 810 | <---|   |--->|   |-U->|   |-H->|   |---> | 
|  | 811 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 812 | +---+    +---+    +---+    +---+ | 
|  | 813 |  | 
|  | 814 | The nested writer moves the tail page forward. But does not set the old | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 815 | update page to NORMAL because it is not the outermost writer. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 816 |  | 
|  | 817 | tail page | 
|  | 818 | | | 
|  | 819 | v | 
|  | 820 | +---+    +---+    +---+    +---+ | 
|  | 821 | <---|   |--->|   |-U->|   |-H->|   |---> | 
|  | 822 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 823 | +---+    +---+    +---+    +---+ | 
|  | 824 |  | 
|  | 825 | Another writer preempts and sees the page after the tail page is a head page. | 
|  | 826 | It changes it from HEAD to UPDATE. | 
|  | 827 |  | 
|  | 828 | (third writer) | 
|  | 829 |  | 
|  | 830 | tail page | 
|  | 831 | | | 
|  | 832 | v | 
|  | 833 | +---+    +---+    +---+    +---+ | 
|  | 834 | <---|   |--->|   |-U->|   |-U->|   |---> | 
|  | 835 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 836 | +---+    +---+    +---+    +---+ | 
|  | 837 |  | 
|  | 838 | The writer will move the head page forward: | 
|  | 839 |  | 
|  | 840 |  | 
|  | 841 | (third writer) | 
|  | 842 |  | 
|  | 843 | tail page | 
|  | 844 | | | 
|  | 845 | v | 
|  | 846 | +---+    +---+    +---+    +---+ | 
|  | 847 | <---|   |--->|   |-U->|   |-U->|   |-H-> | 
|  | 848 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 849 | +---+    +---+    +---+    +---+ | 
|  | 850 |  | 
|  | 851 | But now that the third writer did change the HEAD flag to UPDATE it | 
|  | 852 | will convert it to normal: | 
|  | 853 |  | 
|  | 854 |  | 
|  | 855 | (third writer) | 
|  | 856 |  | 
|  | 857 | tail page | 
|  | 858 | | | 
|  | 859 | v | 
|  | 860 | +---+    +---+    +---+    +---+ | 
|  | 861 | <---|   |--->|   |-U->|   |--->|   |-H-> | 
|  | 862 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 863 | +---+    +---+    +---+    +---+ | 
|  | 864 |  | 
|  | 865 |  | 
|  | 866 | Then it will move the tail page, and return back to the second writer. | 
|  | 867 |  | 
|  | 868 |  | 
|  | 869 | (second writer) | 
|  | 870 |  | 
|  | 871 | tail page | 
|  | 872 | | | 
|  | 873 | v | 
|  | 874 | +---+    +---+    +---+    +---+ | 
|  | 875 | <---|   |--->|   |-U->|   |--->|   |-H-> | 
|  | 876 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 877 | +---+    +---+    +---+    +---+ | 
|  | 878 |  | 
|  | 879 |  | 
|  | 880 | The second writer will fail to move the tail page because it was already | 
|  | 881 | moved, so it will try again and add its data to the new tail page. | 
|  | 882 | It will return to the first writer. | 
|  | 883 |  | 
|  | 884 |  | 
|  | 885 | (first writer) | 
|  | 886 |  | 
|  | 887 | tail page | 
|  | 888 | | | 
|  | 889 | v | 
|  | 890 | +---+    +---+    +---+    +---+ | 
|  | 891 | <---|   |--->|   |-U->|   |--->|   |-H-> | 
|  | 892 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 893 | +---+    +---+    +---+    +---+ | 
|  | 894 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 895 | The first writer cannot know atomically if the tail page moved | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 896 | while it updates the HEAD page. It will then update the head page to | 
|  | 897 | what it thinks is the new head page. | 
|  | 898 |  | 
|  | 899 |  | 
|  | 900 | (first writer) | 
|  | 901 |  | 
|  | 902 | tail page | 
|  | 903 | | | 
|  | 904 | v | 
|  | 905 | +---+    +---+    +---+    +---+ | 
|  | 906 | <---|   |--->|   |-U->|   |-H->|   |-H-> | 
|  | 907 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 908 | +---+    +---+    +---+    +---+ | 
|  | 909 |  | 
|  | 910 | Since the cmpxchg returns the old value of the pointer the first writer | 
|  | 911 | will see it succeeded in updating the pointer from NORMAL to HEAD. | 
|  | 912 | But as we can see, this is not good enough. It must also check to see | 
|  | 913 | if the tail page is either where it use to be or on the next page: | 
|  | 914 |  | 
|  | 915 |  | 
|  | 916 | (first writer) | 
|  | 917 |  | 
|  | 918 | A        B    tail page | 
|  | 919 | |        |        | | 
|  | 920 | v        v        v | 
|  | 921 | +---+    +---+    +---+    +---+ | 
|  | 922 | <---|   |--->|   |-U->|   |-H->|   |-H-> | 
|  | 923 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 924 | +---+    +---+    +---+    +---+ | 
|  | 925 |  | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 926 | If tail page != A and tail page != B, then it must reset the pointer | 
|  | 927 | back to NORMAL. The fact that it only needs to worry about nested | 
|  | 928 | writers means that it only needs to check this after setting the HEAD page. | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 929 |  | 
|  | 930 |  | 
|  | 931 | (first writer) | 
|  | 932 |  | 
|  | 933 | A        B    tail page | 
|  | 934 | |        |        | | 
|  | 935 | v        v        v | 
|  | 936 | +---+    +---+    +---+    +---+ | 
|  | 937 | <---|   |--->|   |-U->|   |--->|   |-H-> | 
|  | 938 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 939 | +---+    +---+    +---+    +---+ | 
|  | 940 |  | 
|  | 941 | Now the writer can update the head page. This is also why the head page must | 
| Randy Dunlap | 006b429 | 2010-01-08 14:43:07 -0800 | [diff] [blame] | 942 | remain in UPDATE and only reset by the outermost writer. This prevents | 
| Steven Rostedt | 8b2c70d | 2009-06-10 15:45:41 -0400 | [diff] [blame] | 943 | the reader from seeing the incorrect head page. | 
|  | 944 |  | 
|  | 945 |  | 
|  | 946 | (first writer) | 
|  | 947 |  | 
|  | 948 | A        B    tail page | 
|  | 949 | |        |        | | 
|  | 950 | v        v        v | 
|  | 951 | +---+    +---+    +---+    +---+ | 
|  | 952 | <---|   |--->|   |--->|   |--->|   |-H-> | 
|  | 953 | --->|   |<---|   |<---|   |<---|   |<--- | 
|  | 954 | +---+    +---+    +---+    +---+ | 
|  | 955 |  |