| Joe Thornber | 3241b1d | 2011-10-31 20:19:11 +0000 | [diff] [blame] | 1 | Introduction | 
 | 2 | ============ | 
 | 3 |  | 
 | 4 | The more-sophisticated device-mapper targets require complex metadata | 
 | 5 | that is managed in kernel.  In late 2010 we were seeing that various | 
 | 6 | different targets were rolling their own data strutures, for example: | 
 | 7 |  | 
 | 8 | - Mikulas Patocka's multisnap implementation | 
 | 9 | - Heinz Mauelshagen's thin provisioning target | 
 | 10 | - Another btree-based caching target posted to dm-devel | 
 | 11 | - Another multi-snapshot target based on a design of Daniel Phillips | 
 | 12 |  | 
 | 13 | Maintaining these data structures takes a lot of work, so if possible | 
 | 14 | we'd like to reduce the number. | 
 | 15 |  | 
 | 16 | The persistent-data library is an attempt to provide a re-usable | 
 | 17 | framework for people who want to store metadata in device-mapper | 
 | 18 | targets.  It's currently used by the thin-provisioning target and an | 
 | 19 | upcoming hierarchical storage target. | 
 | 20 |  | 
 | 21 | Overview | 
 | 22 | ======== | 
 | 23 |  | 
 | 24 | The main documentation is in the header files which can all be found | 
 | 25 | under drivers/md/persistent-data. | 
 | 26 |  | 
 | 27 | The block manager | 
 | 28 | ----------------- | 
 | 29 |  | 
 | 30 | dm-block-manager.[hc] | 
 | 31 |  | 
 | 32 | This provides access to the data on disk in fixed sized-blocks.  There | 
 | 33 | is a read/write locking interface to prevent concurrent accesses, and | 
 | 34 | keep data that is being used in the cache. | 
 | 35 |  | 
 | 36 | Clients of persistent-data are unlikely to use this directly. | 
 | 37 |  | 
 | 38 | The transaction manager | 
 | 39 | ----------------------- | 
 | 40 |  | 
 | 41 | dm-transaction-manager.[hc] | 
 | 42 |  | 
 | 43 | This restricts access to blocks and enforces copy-on-write semantics. | 
 | 44 | The only way you can get hold of a writable block through the | 
 | 45 | transaction manager is by shadowing an existing block (ie. doing | 
 | 46 | copy-on-write) or allocating a fresh one.  Shadowing is elided within | 
 | 47 | the same transaction so performance is reasonable.  The commit method | 
 | 48 | ensures that all data is flushed before it writes the superblock. | 
 | 49 | On power failure your metadata will be as it was when last committed. | 
 | 50 |  | 
 | 51 | The Space Maps | 
 | 52 | -------------- | 
 | 53 |  | 
 | 54 | dm-space-map.h | 
 | 55 | dm-space-map-metadata.[hc] | 
 | 56 | dm-space-map-disk.[hc] | 
 | 57 |  | 
 | 58 | On-disk data structures that keep track of reference counts of blocks. | 
 | 59 | Also acts as the allocator of new blocks.  Currently two | 
 | 60 | implementations: a simpler one for managing blocks on a different | 
 | 61 | device (eg. thinly-provisioned data blocks); and one for managing | 
 | 62 | the metadata space.  The latter is complicated by the need to store | 
 | 63 | its own data within the space it's managing. | 
 | 64 |  | 
 | 65 | The data structures | 
 | 66 | ------------------- | 
 | 67 |  | 
 | 68 | dm-btree.[hc] | 
 | 69 | dm-btree-remove.c | 
 | 70 | dm-btree-spine.c | 
 | 71 | dm-btree-internal.h | 
 | 72 |  | 
 | 73 | Currently there is only one data structure, a hierarchical btree. | 
 | 74 | There are plans to add more.  For example, something with an | 
 | 75 | array-like interface would see a lot of use. | 
 | 76 |  | 
 | 77 | The btree is 'hierarchical' in that you can define it to be composed | 
 | 78 | of nested btrees, and take multiple keys.  For example, the | 
 | 79 | thin-provisioning target uses a btree with two levels of nesting. | 
 | 80 | The first maps a device id to a mapping tree, and that in turn maps a | 
 | 81 | virtual block to a physical block. | 
 | 82 |  | 
 | 83 | Values stored in the btrees can have arbitrary size.  Keys are always | 
 | 84 | 64bits, although nesting allows you to use multiple keys. |