| David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 1 |  | 
| Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 2 | BTRFS | 
|  | 3 | ===== | 
| David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 4 |  | 
| Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 5 | Btrfs is a copy on write filesystem for Linux aimed at | 
| David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 6 | implementing advanced features while focusing on fault tolerance, | 
|  | 7 | repair and easy administration. Initially developed by Oracle, Btrfs | 
|  | 8 | is licensed under the GPL and open for contribution from anyone. | 
|  | 9 |  | 
|  | 10 | Linux has a wealth of filesystems to choose from, but we are facing a | 
|  | 11 | number of challenges with scaling to the large storage subsystems that | 
|  | 12 | are becoming common in today's data centers. Filesystems need to scale | 
|  | 13 | in their ability to address and manage large storage, and also in | 
|  | 14 | their ability to detect, repair and tolerate errors in the data stored | 
|  | 15 | on disk.  Btrfs is under heavy development, and is not suitable for | 
|  | 16 | any uses other than benchmarking and review. The Btrfs disk format is | 
|  | 17 | not yet finalized. | 
|  | 18 |  | 
|  | 19 | The main Btrfs features include: | 
|  | 20 |  | 
|  | 21 | * Extent based file storage (2^64 max file size) | 
|  | 22 | * Space efficient packing of small files | 
|  | 23 | * Space efficient indexed directories | 
|  | 24 | * Dynamic inode allocation | 
|  | 25 | * Writable snapshots | 
|  | 26 | * Subvolumes (separate internal filesystem roots) | 
|  | 27 | * Object level mirroring and striping | 
|  | 28 | * Checksums on data and metadata (multiple algorithms available) | 
|  | 29 | * Compression | 
|  | 30 | * Integrated multiple device support, with several raid algorithms | 
|  | 31 | * Online filesystem check (not yet implemented) | 
|  | 32 | * Very fast offline filesystem check | 
|  | 33 | * Efficient incremental backup and FS mirroring (not yet implemented) | 
|  | 34 | * Online filesystem defragmentation | 
|  | 35 |  | 
|  | 36 |  | 
| Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 37 | Mount Options | 
|  | 38 | ============= | 
| David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 39 |  | 
| Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 40 | When mounting a btrfs filesystem, the following option are accepted. | 
|  | 41 | Unless otherwise specified, all options default to off. | 
|  | 42 |  | 
|  | 43 | alloc_start=<bytes> | 
|  | 44 | Debugging option to force all block allocations above a certain | 
|  | 45 | byte threshold on each block device.  The value is specified in | 
|  | 46 | bytes, optionally with a K, M, or G suffix, case insensitive. | 
|  | 47 | Default is 1MB. | 
|  | 48 |  | 
|  | 49 | autodefrag | 
|  | 50 | Detect small random writes into files and queue them up for the | 
|  | 51 | defrag process.  Works best for small files; Not well suited for | 
|  | 52 | large database workloads. | 
|  | 53 |  | 
|  | 54 | check_int | 
|  | 55 | check_int_data | 
|  | 56 | check_int_print_mask=<value> | 
|  | 57 | These debugging options control the behavior of the integrity checking | 
|  | 58 | module (the BTRFS_FS_CHECK_INTEGRITY config option required). | 
|  | 59 |  | 
|  | 60 | check_int enables the integrity checker module, which examines all | 
|  | 61 | block write requests to ensure on-disk consistency, at a large | 
|  | 62 | memory and CPU cost. | 
|  | 63 |  | 
|  | 64 | check_int_data includes extent data in the integrity checks, and | 
|  | 65 | implies the check_int option. | 
|  | 66 |  | 
|  | 67 | check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values | 
|  | 68 | as defined in fs/btrfs/check-integrity.c, to control the integrity | 
|  | 69 | checker module behavior. | 
|  | 70 |  | 
|  | 71 | See comments at the top of fs/btrfs/check-integrity.c for more info. | 
|  | 72 |  | 
|  | 73 | compress | 
|  | 74 | compress=<type> | 
|  | 75 | compress-force | 
|  | 76 | compress-force=<type> | 
|  | 77 | Control BTRFS file data compression.  Type may be specified as "zlib" | 
|  | 78 | "lzo" or "no" (for no compression, used for remounting).  If no type | 
|  | 79 | is specified, zlib is used.  If compress-force is specified, | 
|  | 80 | all files will be compressed, whether or not they compress well. | 
|  | 81 | If compression is enabled, nodatacow and nodatasum are disabled. | 
|  | 82 |  | 
|  | 83 | degraded | 
|  | 84 | Allow mounts to continue with missing devices.  A read-write mount may | 
|  | 85 | fail with too many devices missing, for example if a stripe member | 
|  | 86 | is completely missing. | 
|  | 87 |  | 
|  | 88 | device=<devicepath> | 
|  | 89 | Specify a device during mount so that ioctls on the control device | 
|  | 90 | can be avoided.  Especialy useful when trying to mount a multi-device | 
|  | 91 | setup as root.  May be specified multiple times for multiple devices. | 
|  | 92 |  | 
|  | 93 | discard | 
|  | 94 | Issue frequent commands to let the block device reclaim space freed by | 
|  | 95 | the filesystem.  This is useful for SSD devices, thinly provisioned | 
|  | 96 | LUNs and virtual machine images, but may have a significant | 
|  | 97 | performance impact.  (The fstrim command is also available to | 
|  | 98 | initiate batch trims from userspace). | 
|  | 99 |  | 
|  | 100 | enospc_debug | 
|  | 101 | Debugging option to be more verbose in some ENOSPC conditions. | 
|  | 102 |  | 
|  | 103 | fatal_errors=<action> | 
|  | 104 | Action to take when encountering a fatal error: | 
|  | 105 | "bug" - BUG() on a fatal error.  This is the default. | 
|  | 106 | "panic" - panic() on a fatal error. | 
|  | 107 |  | 
|  | 108 | flushoncommit | 
|  | 109 | The 'flushoncommit' mount option forces any data dirtied by a write in a | 
|  | 110 | prior transaction to commit as part of the current commit.  This makes | 
|  | 111 | the committed state a fully consistent view of the file system from the | 
|  | 112 | application's perspective (i.e., it includes all completed file system | 
|  | 113 | operations).  This was previously the behavior only when a snapshot is | 
|  | 114 | created. | 
|  | 115 |  | 
|  | 116 | inode_cache | 
|  | 117 | Enable free inode number caching.   Defaults to off due to an overflow | 
|  | 118 | problem when the free space crcs don't fit inside a single page. | 
|  | 119 |  | 
|  | 120 | max_inline=<bytes> | 
|  | 121 | Specify the maximum amount of space, in bytes, that can be inlined in | 
|  | 122 | a metadata B-tree leaf.  The value is specified in bytes, optionally | 
|  | 123 | with a K, M, or G suffix, case insensitive.  In practice, this value | 
|  | 124 | is limited by the root sector size, with some space unavailable due | 
|  | 125 | to leaf headers.  For a 4k sectorsize, max inline data is ~3900 bytes. | 
|  | 126 |  | 
|  | 127 | metadata_ratio=<value> | 
|  | 128 | Specify that 1 metadata chunk should be allocated after every <value> | 
|  | 129 | data chunks.  Off by default. | 
|  | 130 |  | 
|  | 131 | noacl | 
|  | 132 | Disable support for Posix Access Control Lists (ACLs).  See the | 
|  | 133 | acl(5) manual page for more information about ACLs. | 
|  | 134 |  | 
|  | 135 | nobarrier | 
|  | 136 | Disables the use of block layer write barriers.  Write barriers ensure | 
|  | 137 | that certain IOs make it through the device cache and are on persistent | 
|  | 138 | storage.  If used on a device with a volatile (non-battery-backed) | 
|  | 139 | write-back cache, this option will lead to filesystem corruption on a | 
|  | 140 | system crash or power loss. | 
|  | 141 |  | 
|  | 142 | nodatacow | 
|  | 143 | Disable data copy-on-write for newly created files.  Implies nodatasum, | 
|  | 144 | and disables all compression. | 
|  | 145 |  | 
|  | 146 | nodatasum | 
|  | 147 | Disable data checksumming for newly created files. | 
|  | 148 |  | 
|  | 149 | notreelog | 
|  | 150 | Disable the tree logging used for fsync and O_SYNC writes. | 
|  | 151 |  | 
|  | 152 | recovery | 
|  | 153 | Enable autorecovery attempts if a bad tree root is found at mount time. | 
|  | 154 | Currently this scans a list of several previous tree roots and tries to | 
|  | 155 | use the first readable. | 
|  | 156 |  | 
|  | 157 | skip_balance | 
|  | 158 | Skip automatic resume of interrupted balance operation after mount. | 
|  | 159 | May be resumed with "btrfs balance resume." | 
|  | 160 |  | 
|  | 161 | space_cache (*) | 
|  | 162 | Enable the on-disk freespace cache. | 
|  | 163 | nospace_cache | 
|  | 164 | Disable freespace cache loading without clearing the cache. | 
|  | 165 | clear_cache | 
|  | 166 | Force clearing and rebuilding of the disk space cache if something | 
|  | 167 | has gone wrong. | 
|  | 168 |  | 
|  | 169 | ssd | 
|  | 170 | nossd | 
|  | 171 | ssd_spread | 
|  | 172 | Options to control ssd allocation schemes.  By default, BTRFS will | 
|  | 173 | enable or disable ssd allocation heuristics depending on whether a | 
|  | 174 | rotational or nonrotational disk is in use.  The ssd and nossd options | 
|  | 175 | can override this autodetection. | 
|  | 176 |  | 
|  | 177 | The ssd_spread mount option attempts to allocate into big chunks | 
|  | 178 | of unused space, and may perform better on low-end ssds.  ssd_spread | 
|  | 179 | implies ssd, enabling all other ssd heuristics as well. | 
|  | 180 |  | 
|  | 181 | subvol=<path> | 
|  | 182 | Mount subvolume at <path> rather than the root subvolume.  <path> is | 
|  | 183 | relative to the top level subvolume. | 
|  | 184 |  | 
|  | 185 | subvolid=<ID> | 
|  | 186 | Mount subvolume specified by an ID number rather than the root subvolume. | 
|  | 187 | This allows mounting of subvolumes which are not in the root of the mounted | 
|  | 188 | filesystem. | 
|  | 189 | You can use "btrfs subvolume list" to see subvolume ID numbers. | 
|  | 190 |  | 
|  | 191 | subvolrootid=<objectid> (deprecated) | 
|  | 192 | Mount subvolume specified by <objectid> rather than the root subvolume. | 
|  | 193 | This allows mounting of subvolumes which are not in the root of the mounted | 
|  | 194 | filesystem. | 
|  | 195 | You can use "btrfs subvolume show " to see the object ID for a subvolume. | 
|  | 196 |  | 
|  | 197 | thread_pool=<number> | 
|  | 198 | The number of worker threads to allocate.  The default number is equal | 
|  | 199 | to the number of CPUs + 2, or 8, whichever is smaller. | 
|  | 200 |  | 
|  | 201 | user_subvol_rm_allowed | 
|  | 202 | Allow subvolumes to be deleted by a non-root user. Use with caution. | 
|  | 203 |  | 
|  | 204 | MAILING LIST | 
|  | 205 | ============ | 
| David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 206 |  | 
|  | 207 | There is a Btrfs mailing list hosted on vger.kernel.org. You can | 
|  | 208 | find details on how to subscribe here: | 
|  | 209 |  | 
|  | 210 | http://vger.kernel.org/vger-lists.html#linux-btrfs | 
|  | 211 |  | 
|  | 212 | Mailing list archives are available from gmane: | 
|  | 213 |  | 
|  | 214 | http://dir.gmane.org/gmane.comp.file-systems.btrfs | 
|  | 215 |  | 
|  | 216 |  | 
|  | 217 |  | 
| Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 218 | IRC | 
|  | 219 | === | 
| David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 220 |  | 
|  | 221 | Discussion of Btrfs also occurs on the #btrfs channel of the Freenode | 
|  | 222 | IRC network. | 
|  | 223 |  | 
|  | 224 |  | 
|  | 225 |  | 
|  | 226 | UTILITIES | 
|  | 227 | ========= | 
|  | 228 |  | 
|  | 229 | Userspace tools for creating and manipulating Btrfs file systems are | 
|  | 230 | available from the git repository at the following location: | 
|  | 231 |  | 
| Arnd Hannemann | b52f75a | 2011-11-16 17:35:37 +0100 | [diff] [blame] | 232 | http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git | 
|  | 233 | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git | 
| David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 234 |  | 
|  | 235 | These include the following tools: | 
|  | 236 |  | 
|  | 237 | mkfs.btrfs: create a filesystem | 
|  | 238 |  | 
|  | 239 | btrfsctl: control program to create snapshots and subvolumes: | 
|  | 240 |  | 
|  | 241 | mount /dev/sda2 /mnt | 
|  | 242 | btrfsctl -s new_subvol_name /mnt | 
|  | 243 | btrfsctl -s snapshot_of_default /mnt/default | 
|  | 244 | btrfsctl -s snapshot_of_new_subvol /mnt/new_subvol_name | 
|  | 245 | btrfsctl -s snapshot_of_a_snapshot /mnt/snapshot_of_new_subvol | 
|  | 246 | ls /mnt | 
|  | 247 | default snapshot_of_a_snapshot snapshot_of_new_subvol | 
|  | 248 | new_subvol_name snapshot_of_default | 
|  | 249 |  | 
|  | 250 | Snapshots and subvolumes cannot be deleted right now, but you can | 
|  | 251 | rm -rf all the files and directories inside them. | 
|  | 252 |  | 
|  | 253 | btrfsck: do a limited check of the FS extent trees. | 
|  | 254 |  | 
|  | 255 | btrfs-debug-tree: print all of the FS metadata in text form.  Example: | 
|  | 256 |  | 
|  | 257 | btrfs-debug-tree /dev/sda2 >& big_output_file |