| Christoph Hellwig | 04ccc65 | 2010-09-03 11:56:17 +0200 | [diff] [blame] | 1 |  | 
 | 2 | Explicit volatile write back cache control | 
 | 3 | ===================================== | 
 | 4 |  | 
 | 5 | Introduction | 
 | 6 | ------------ | 
 | 7 |  | 
 | 8 | Many storage devices, especially in the consumer market, come with volatile | 
 | 9 | write back caches.  That means the devices signal I/O completion to the | 
 | 10 | operating system before data actually has hit the non-volatile storage.  This | 
 | 11 | behavior obviously speeds up various workloads, but it means the operating | 
 | 12 | system needs to force data out to the non-volatile storage when it performs | 
 | 13 | a data integrity operation like fsync, sync or an unmount. | 
 | 14 |  | 
 | 15 | The Linux block layer provides two simple mechanisms that let filesystems | 
 | 16 | control the caching behavior of the storage device.  These mechanisms are | 
 | 17 | a forced cache flush, and the Force Unit Access (FUA) flag for requests. | 
 | 18 |  | 
 | 19 |  | 
 | 20 | Explicit cache flushes | 
 | 21 | ---------------------- | 
 | 22 |  | 
 | 23 | The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from | 
 | 24 | the filesystem and will make sure the volatile cache of the storage device | 
 | 25 | has been flushed before the actual I/O operation is started.  This explicitly | 
 | 26 | guarantees that previously completed write requests are on non-volatile | 
 | 27 | storage before the flagged bio starts. In addition the REQ_FLUSH flag can be | 
 | 28 | set on an otherwise empty bio structure, which causes only an explicit cache | 
 | 29 | flush without any dependent I/O.  It is recommend to use | 
 | 30 | the blkdev_issue_flush() helper for a pure cache flush. | 
 | 31 |  | 
 | 32 |  | 
 | 33 | Forced Unit Access | 
 | 34 | ----------------- | 
 | 35 |  | 
 | 36 | The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the | 
 | 37 | filesystem and will make sure that I/O completion for this request is only | 
 | 38 | signaled after the data has been committed to non-volatile storage. | 
 | 39 |  | 
 | 40 |  | 
 | 41 | Implementation details for filesystems | 
 | 42 | -------------------------------------- | 
 | 43 |  | 
 | 44 | Filesystems can simply set the REQ_FLUSH and REQ_FUA bits and do not have to | 
 | 45 | worry if the underlying devices need any explicit cache flushing and how | 
 | 46 | the Forced Unit Access is implemented.  The REQ_FLUSH and REQ_FUA flags | 
 | 47 | may both be set on a single bio. | 
 | 48 |  | 
 | 49 |  | 
 | 50 | Implementation details for make_request_fn based block drivers | 
 | 51 | -------------------------------------------------------------- | 
 | 52 |  | 
 | 53 | These drivers will always see the REQ_FLUSH and REQ_FUA bits as they sit | 
 | 54 | directly below the submit_bio interface.  For remapping drivers the REQ_FUA | 
 | 55 | bits need to be propagated to underlying devices, and a global flush needs | 
 | 56 | to be implemented for bios with the REQ_FLUSH bit set.  For real device | 
 | 57 | drivers that do not have a volatile cache the REQ_FLUSH and REQ_FUA bits | 
 | 58 | on non-empty bios can simply be ignored, and REQ_FLUSH requests without | 
 | 59 | data can be completed successfully without doing any work.  Drivers for | 
 | 60 | devices with volatile caches need to implement the support for these | 
 | 61 | flags themselves without any help from the block layer. | 
 | 62 |  | 
 | 63 |  | 
 | 64 | Implementation details for request_fn based block drivers | 
 | 65 | -------------------------------------------------------------- | 
 | 66 |  | 
 | 67 | For devices that do not support volatile write caches there is no driver | 
 | 68 | support required, the block layer completes empty REQ_FLUSH requests before | 
 | 69 | entering the driver and strips off the REQ_FLUSH and REQ_FUA bits from | 
 | 70 | requests that have a payload.  For devices with volatile write caches the | 
 | 71 | driver needs to tell the block layer that it supports flushing caches by | 
 | 72 | doing: | 
 | 73 |  | 
 | 74 | 	blk_queue_flush(sdkp->disk->queue, REQ_FLUSH); | 
 | 75 |  | 
 | 76 | and handle empty REQ_FLUSH requests in its prep_fn/request_fn.  Note that | 
 | 77 | REQ_FLUSH requests with a payload are automatically turned into a sequence | 
 | 78 | of an empty REQ_FLUSH request followed by the actual write by the block | 
 | 79 | layer.  For devices that also support the FUA bit the block layer needs | 
 | 80 | to be told to pass through the REQ_FUA bit using: | 
 | 81 |  | 
 | 82 | 	blk_queue_flush(sdkp->disk->queue, REQ_FLUSH | REQ_FUA); | 
 | 83 |  | 
 | 84 | and the driver must handle write requests that have the REQ_FUA bit set | 
 | 85 | in prep_fn/request_fn.  If the FUA bit is not natively supported the block | 
 | 86 | layer turns it into an empty REQ_FLUSH request after the actual write. |