| Joe Thornber | 3241b1d | 2011-10-31 20:19:11 +0000 | [diff] [blame] | 1 | Introduction | 
|  | 2 | ============ | 
|  | 3 |  | 
|  | 4 | The more-sophisticated device-mapper targets require complex metadata | 
|  | 5 | that is managed in kernel.  In late 2010 we were seeing that various | 
| Masanari Iida | 40e4712 | 2012-03-04 23:16:11 +0900 | [diff] [blame] | 6 | different targets were rolling their own data structures, for example: | 
| Joe Thornber | 3241b1d | 2011-10-31 20:19:11 +0000 | [diff] [blame] | 7 |  | 
|  | 8 | - Mikulas Patocka's multisnap implementation | 
|  | 9 | - Heinz Mauelshagen's thin provisioning target | 
|  | 10 | - Another btree-based caching target posted to dm-devel | 
|  | 11 | - Another multi-snapshot target based on a design of Daniel Phillips | 
|  | 12 |  | 
|  | 13 | Maintaining these data structures takes a lot of work, so if possible | 
|  | 14 | we'd like to reduce the number. | 
|  | 15 |  | 
|  | 16 | The persistent-data library is an attempt to provide a re-usable | 
|  | 17 | framework for people who want to store metadata in device-mapper | 
|  | 18 | targets.  It's currently used by the thin-provisioning target and an | 
|  | 19 | upcoming hierarchical storage target. | 
|  | 20 |  | 
|  | 21 | Overview | 
|  | 22 | ======== | 
|  | 23 |  | 
|  | 24 | The main documentation is in the header files which can all be found | 
|  | 25 | under drivers/md/persistent-data. | 
|  | 26 |  | 
|  | 27 | The block manager | 
|  | 28 | ----------------- | 
|  | 29 |  | 
|  | 30 | dm-block-manager.[hc] | 
|  | 31 |  | 
|  | 32 | This provides access to the data on disk in fixed sized-blocks.  There | 
|  | 33 | is a read/write locking interface to prevent concurrent accesses, and | 
|  | 34 | keep data that is being used in the cache. | 
|  | 35 |  | 
|  | 36 | Clients of persistent-data are unlikely to use this directly. | 
|  | 37 |  | 
|  | 38 | The transaction manager | 
|  | 39 | ----------------------- | 
|  | 40 |  | 
|  | 41 | dm-transaction-manager.[hc] | 
|  | 42 |  | 
|  | 43 | This restricts access to blocks and enforces copy-on-write semantics. | 
|  | 44 | The only way you can get hold of a writable block through the | 
|  | 45 | transaction manager is by shadowing an existing block (ie. doing | 
|  | 46 | copy-on-write) or allocating a fresh one.  Shadowing is elided within | 
|  | 47 | the same transaction so performance is reasonable.  The commit method | 
|  | 48 | ensures that all data is flushed before it writes the superblock. | 
|  | 49 | On power failure your metadata will be as it was when last committed. | 
|  | 50 |  | 
|  | 51 | The Space Maps | 
|  | 52 | -------------- | 
|  | 53 |  | 
|  | 54 | dm-space-map.h | 
|  | 55 | dm-space-map-metadata.[hc] | 
|  | 56 | dm-space-map-disk.[hc] | 
|  | 57 |  | 
|  | 58 | On-disk data structures that keep track of reference counts of blocks. | 
|  | 59 | Also acts as the allocator of new blocks.  Currently two | 
|  | 60 | implementations: a simpler one for managing blocks on a different | 
|  | 61 | device (eg. thinly-provisioned data blocks); and one for managing | 
|  | 62 | the metadata space.  The latter is complicated by the need to store | 
|  | 63 | its own data within the space it's managing. | 
|  | 64 |  | 
|  | 65 | The data structures | 
|  | 66 | ------------------- | 
|  | 67 |  | 
|  | 68 | dm-btree.[hc] | 
|  | 69 | dm-btree-remove.c | 
|  | 70 | dm-btree-spine.c | 
|  | 71 | dm-btree-internal.h | 
|  | 72 |  | 
|  | 73 | Currently there is only one data structure, a hierarchical btree. | 
|  | 74 | There are plans to add more.  For example, something with an | 
|  | 75 | array-like interface would see a lot of use. | 
|  | 76 |  | 
|  | 77 | The btree is 'hierarchical' in that you can define it to be composed | 
|  | 78 | of nested btrees, and take multiple keys.  For example, the | 
|  | 79 | thin-provisioning target uses a btree with two levels of nesting. | 
|  | 80 | The first maps a device id to a mapping tree, and that in turn maps a | 
|  | 81 | virtual block to a physical block. | 
|  | 82 |  | 
|  | 83 | Values stored in the btrees can have arbitrary size.  Keys are always | 
|  | 84 | 64bits, although nesting allows you to use multiple keys. |