|  |  | 
|  | BTRFS | 
|  | ===== | 
|  |  | 
|  | Btrfs is a copy on write filesystem for Linux aimed at | 
|  | implementing advanced features while focusing on fault tolerance, | 
|  | repair and easy administration. Initially developed by Oracle, Btrfs | 
|  | is licensed under the GPL and open for contribution from anyone. | 
|  |  | 
|  | Linux has a wealth of filesystems to choose from, but we are facing a | 
|  | number of challenges with scaling to the large storage subsystems that | 
|  | are becoming common in today's data centers. Filesystems need to scale | 
|  | in their ability to address and manage large storage, and also in | 
|  | their ability to detect, repair and tolerate errors in the data stored | 
|  | on disk.  Btrfs is under heavy development, and is not suitable for | 
|  | any uses other than benchmarking and review. The Btrfs disk format is | 
|  | not yet finalized. | 
|  |  | 
|  | The main Btrfs features include: | 
|  |  | 
|  | * Extent based file storage (2^64 max file size) | 
|  | * Space efficient packing of small files | 
|  | * Space efficient indexed directories | 
|  | * Dynamic inode allocation | 
|  | * Writable snapshots | 
|  | * Subvolumes (separate internal filesystem roots) | 
|  | * Object level mirroring and striping | 
|  | * Checksums on data and metadata (multiple algorithms available) | 
|  | * Compression | 
|  | * Integrated multiple device support, with several raid algorithms | 
|  | * Online filesystem check (not yet implemented) | 
|  | * Very fast offline filesystem check | 
|  | * Efficient incremental backup and FS mirroring (not yet implemented) | 
|  | * Online filesystem defragmentation | 
|  |  | 
|  |  | 
|  | Mount Options | 
|  | ============= | 
|  |  | 
|  | When mounting a btrfs filesystem, the following option are accepted. | 
|  | Unless otherwise specified, all options default to off. | 
|  |  | 
|  | alloc_start=<bytes> | 
|  | Debugging option to force all block allocations above a certain | 
|  | byte threshold on each block device.  The value is specified in | 
|  | bytes, optionally with a K, M, or G suffix, case insensitive. | 
|  | Default is 1MB. | 
|  |  | 
|  | autodefrag | 
|  | Detect small random writes into files and queue them up for the | 
|  | defrag process.  Works best for small files; Not well suited for | 
|  | large database workloads. | 
|  |  | 
|  | check_int | 
|  | check_int_data | 
|  | check_int_print_mask=<value> | 
|  | These debugging options control the behavior of the integrity checking | 
|  | module (the BTRFS_FS_CHECK_INTEGRITY config option required). | 
|  |  | 
|  | check_int enables the integrity checker module, which examines all | 
|  | block write requests to ensure on-disk consistency, at a large | 
|  | memory and CPU cost. | 
|  |  | 
|  | check_int_data includes extent data in the integrity checks, and | 
|  | implies the check_int option. | 
|  |  | 
|  | check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values | 
|  | as defined in fs/btrfs/check-integrity.c, to control the integrity | 
|  | checker module behavior. | 
|  |  | 
|  | See comments at the top of fs/btrfs/check-integrity.c for more info. | 
|  |  | 
|  | compress | 
|  | compress=<type> | 
|  | compress-force | 
|  | compress-force=<type> | 
|  | Control BTRFS file data compression.  Type may be specified as "zlib" | 
|  | "lzo" or "no" (for no compression, used for remounting).  If no type | 
|  | is specified, zlib is used.  If compress-force is specified, | 
|  | all files will be compressed, whether or not they compress well. | 
|  | If compression is enabled, nodatacow and nodatasum are disabled. | 
|  |  | 
|  | degraded | 
|  | Allow mounts to continue with missing devices.  A read-write mount may | 
|  | fail with too many devices missing, for example if a stripe member | 
|  | is completely missing. | 
|  |  | 
|  | device=<devicepath> | 
|  | Specify a device during mount so that ioctls on the control device | 
|  | can be avoided.  Especially useful when trying to mount a multi-device | 
|  | setup as root.  May be specified multiple times for multiple devices. | 
|  |  | 
|  | discard | 
|  | Issue frequent commands to let the block device reclaim space freed by | 
|  | the filesystem.  This is useful for SSD devices, thinly provisioned | 
|  | LUNs and virtual machine images, but may have a significant | 
|  | performance impact.  (The fstrim command is also available to | 
|  | initiate batch trims from userspace). | 
|  |  | 
|  | enospc_debug | 
|  | Debugging option to be more verbose in some ENOSPC conditions. | 
|  |  | 
|  | fatal_errors=<action> | 
|  | Action to take when encountering a fatal error: | 
|  | "bug" - BUG() on a fatal error.  This is the default. | 
|  | "panic" - panic() on a fatal error. | 
|  |  | 
|  | flushoncommit | 
|  | The 'flushoncommit' mount option forces any data dirtied by a write in a | 
|  | prior transaction to commit as part of the current commit.  This makes | 
|  | the committed state a fully consistent view of the file system from the | 
|  | application's perspective (i.e., it includes all completed file system | 
|  | operations).  This was previously the behavior only when a snapshot is | 
|  | created. | 
|  |  | 
|  | inode_cache | 
|  | Enable free inode number caching.   Defaults to off due to an overflow | 
|  | problem when the free space crcs don't fit inside a single page. | 
|  |  | 
|  | max_inline=<bytes> | 
|  | Specify the maximum amount of space, in bytes, that can be inlined in | 
|  | a metadata B-tree leaf.  The value is specified in bytes, optionally | 
|  | with a K, M, or G suffix, case insensitive.  In practice, this value | 
|  | is limited by the root sector size, with some space unavailable due | 
|  | to leaf headers.  For a 4k sectorsize, max inline data is ~3900 bytes. | 
|  |  | 
|  | metadata_ratio=<value> | 
|  | Specify that 1 metadata chunk should be allocated after every <value> | 
|  | data chunks.  Off by default. | 
|  |  | 
|  | noacl | 
|  | Disable support for Posix Access Control Lists (ACLs).  See the | 
|  | acl(5) manual page for more information about ACLs. | 
|  |  | 
|  | nobarrier | 
|  | Disables the use of block layer write barriers.  Write barriers ensure | 
|  | that certain IOs make it through the device cache and are on persistent | 
|  | storage.  If used on a device with a volatile (non-battery-backed) | 
|  | write-back cache, this option will lead to filesystem corruption on a | 
|  | system crash or power loss. | 
|  |  | 
|  | nodatacow | 
|  | Disable data copy-on-write for newly created files.  Implies nodatasum, | 
|  | and disables all compression. | 
|  |  | 
|  | nodatasum | 
|  | Disable data checksumming for newly created files. | 
|  |  | 
|  | notreelog | 
|  | Disable the tree logging used for fsync and O_SYNC writes. | 
|  |  | 
|  | recovery | 
|  | Enable autorecovery attempts if a bad tree root is found at mount time. | 
|  | Currently this scans a list of several previous tree roots and tries to | 
|  | use the first readable. | 
|  |  | 
|  | skip_balance | 
|  | Skip automatic resume of interrupted balance operation after mount. | 
|  | May be resumed with "btrfs balance resume." | 
|  |  | 
|  | space_cache (*) | 
|  | Enable the on-disk freespace cache. | 
|  | nospace_cache | 
|  | Disable freespace cache loading without clearing the cache. | 
|  | clear_cache | 
|  | Force clearing and rebuilding of the disk space cache if something | 
|  | has gone wrong. | 
|  |  | 
|  | ssd | 
|  | nossd | 
|  | ssd_spread | 
|  | Options to control ssd allocation schemes.  By default, BTRFS will | 
|  | enable or disable ssd allocation heuristics depending on whether a | 
|  | rotational or nonrotational disk is in use.  The ssd and nossd options | 
|  | can override this autodetection. | 
|  |  | 
|  | The ssd_spread mount option attempts to allocate into big chunks | 
|  | of unused space, and may perform better on low-end ssds.  ssd_spread | 
|  | implies ssd, enabling all other ssd heuristics as well. | 
|  |  | 
|  | subvol=<path> | 
|  | Mount subvolume at <path> rather than the root subvolume.  <path> is | 
|  | relative to the top level subvolume. | 
|  |  | 
|  | subvolid=<ID> | 
|  | Mount subvolume specified by an ID number rather than the root subvolume. | 
|  | This allows mounting of subvolumes which are not in the root of the mounted | 
|  | filesystem. | 
|  | You can use "btrfs subvolume list" to see subvolume ID numbers. | 
|  |  | 
|  | subvolrootid=<objectid> (deprecated) | 
|  | Mount subvolume specified by <objectid> rather than the root subvolume. | 
|  | This allows mounting of subvolumes which are not in the root of the mounted | 
|  | filesystem. | 
|  | You can use "btrfs subvolume show " to see the object ID for a subvolume. | 
|  |  | 
|  | thread_pool=<number> | 
|  | The number of worker threads to allocate.  The default number is equal | 
|  | to the number of CPUs + 2, or 8, whichever is smaller. | 
|  |  | 
|  | user_subvol_rm_allowed | 
|  | Allow subvolumes to be deleted by a non-root user. Use with caution. | 
|  |  | 
|  | MAILING LIST | 
|  | ============ | 
|  |  | 
|  | There is a Btrfs mailing list hosted on vger.kernel.org. You can | 
|  | find details on how to subscribe here: | 
|  |  | 
|  | http://vger.kernel.org/vger-lists.html#linux-btrfs | 
|  |  | 
|  | Mailing list archives are available from gmane: | 
|  |  | 
|  | http://dir.gmane.org/gmane.comp.file-systems.btrfs | 
|  |  | 
|  |  | 
|  |  | 
|  | IRC | 
|  | === | 
|  |  | 
|  | Discussion of Btrfs also occurs on the #btrfs channel of the Freenode | 
|  | IRC network. | 
|  |  | 
|  |  | 
|  |  | 
|  | UTILITIES | 
|  | ========= | 
|  |  | 
|  | Userspace tools for creating and manipulating Btrfs file systems are | 
|  | available from the git repository at the following location: | 
|  |  | 
|  | http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git | 
|  | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git | 
|  |  | 
|  | These include the following tools: | 
|  |  | 
|  | mkfs.btrfs: create a filesystem | 
|  |  | 
|  | btrfsctl: control program to create snapshots and subvolumes: | 
|  |  | 
|  | mount /dev/sda2 /mnt | 
|  | btrfsctl -s new_subvol_name /mnt | 
|  | btrfsctl -s snapshot_of_default /mnt/default | 
|  | btrfsctl -s snapshot_of_new_subvol /mnt/new_subvol_name | 
|  | btrfsctl -s snapshot_of_a_snapshot /mnt/snapshot_of_new_subvol | 
|  | ls /mnt | 
|  | default snapshot_of_a_snapshot snapshot_of_new_subvol | 
|  | new_subvol_name snapshot_of_default | 
|  |  | 
|  | Snapshots and subvolumes cannot be deleted right now, but you can | 
|  | rm -rf all the files and directories inside them. | 
|  |  | 
|  | btrfsck: do a limited check of the FS extent trees. | 
|  |  | 
|  | btrfs-debug-tree: print all of the FS metadata in text form.  Example: | 
|  |  | 
|  | btrfs-debug-tree /dev/sda2 >& big_output_file |