| Ryusuke Konishi | 962281a | 2009-04-06 19:01:20 -0700 | [diff] [blame] | 1 | NILFS2 | 
|  | 2 | ------ | 
|  | 3 |  | 
|  | 4 | NILFS2 is a log-structured file system (LFS) supporting continuous | 
|  | 5 | snapshotting.  In addition to versioning capability of the entire file | 
|  | 6 | system, users can even restore files mistakenly overwritten or | 
|  | 7 | destroyed just a few seconds ago.  Since NILFS2 can keep consistency | 
|  | 8 | like conventional LFS, it achieves quick recovery after system | 
|  | 9 | crashes. | 
|  | 10 |  | 
|  | 11 | NILFS2 creates a number of checkpoints every few seconds or per | 
|  | 12 | synchronous write basis (unless there is no change).  Users can select | 
|  | 13 | significant versions among continuously created checkpoints, and can | 
|  | 14 | change them into snapshots which will be preserved until they are | 
|  | 15 | changed back to checkpoints. | 
|  | 16 |  | 
|  | 17 | There is no limit on the number of snapshots until the volume gets | 
|  | 18 | full.  Each snapshot is mountable as a read-only file system | 
|  | 19 | concurrently with its writable mount, and this feature is convenient | 
|  | 20 | for online backup. | 
|  | 21 |  | 
|  | 22 | The userland tools are included in nilfs-utils package, which is | 
|  | 23 | available from the following download page.  At least "mkfs.nilfs2", | 
|  | 24 | "mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called | 
|  | 25 | cleaner or garbage collector) are required.  Details on the tools are | 
|  | 26 | described in the man pages included in the package. | 
|  | 27 |  | 
|  | 28 | Project web page:    http://www.nilfs.org/en/ | 
|  | 29 | Download page:       http://www.nilfs.org/en/download.html | 
|  | 30 | Git tree web page:   http://www.nilfs.org/git/ | 
|  | 31 | NILFS mailing lists: http://www.nilfs.org/mailman/listinfo/users | 
|  | 32 |  | 
|  | 33 | Caveats | 
|  | 34 | ======= | 
|  | 35 |  | 
|  | 36 | Features which NILFS2 does not support yet: | 
|  | 37 |  | 
|  | 38 | - atime | 
|  | 39 | - extended attributes | 
|  | 40 | - POSIX ACLs | 
|  | 41 | - quotas | 
|  | 42 | - writable snapshots | 
|  | 43 | - remote backup (CDP) | 
|  | 44 | - data integrity | 
|  | 45 | - defragmentation | 
|  | 46 |  | 
|  | 47 | Mount options | 
|  | 48 | ============= | 
|  | 49 |  | 
|  | 50 | NILFS2 supports the following mount options: | 
|  | 51 | (*) == default | 
|  | 52 |  | 
|  | 53 | barrier=on(*)		This enables/disables barriers. barrier=off disables | 
|  | 54 | it, barrier=on enables it. | 
|  | 55 | errors=continue(*)	Keep going on a filesystem error. | 
|  | 56 | errors=remount-ro	Remount the filesystem read-only on an error. | 
|  | 57 | errors=panic		Panic and halt the machine if an error occurs. | 
|  | 58 | cp=n			Specify the checkpoint-number of the snapshot to be | 
|  | 59 | mounted.  Checkpoints and snapshots are listed by lscp | 
|  | 60 | user command.  Only the checkpoints marked as snapshot | 
|  | 61 | are mountable with this option.  Snapshot is read-only, | 
|  | 62 | so a read-only mount option must be specified together. | 
|  | 63 | order=relaxed(*)	Apply relaxed order semantics that allows modified data | 
|  | 64 | blocks to be written to disk without making a | 
|  | 65 | checkpoint if no metadata update is going.  This mode | 
|  | 66 | is equivalent to the ordered data mode of the ext3 | 
|  | 67 | filesystem except for the updates on data blocks still | 
|  | 68 | conserve atomicity.  This will improve synchronous | 
|  | 69 | write performance for overwriting. | 
|  | 70 | order=strict		Apply strict in-order semantics that preserves sequence | 
|  | 71 | of all file operations including overwriting of data | 
|  | 72 | blocks.  That means, it is guaranteed that no | 
|  | 73 | overtaking of events occurs in the recovered file | 
|  | 74 | system after a crash. | 
|  | 75 |  | 
|  | 76 | NILFS2 usage | 
|  | 77 | ============ | 
|  | 78 |  | 
|  | 79 | To use nilfs2 as a local file system, simply: | 
|  | 80 |  | 
|  | 81 | # mkfs -t nilfs2 /dev/block_device | 
|  | 82 | # mount -t nilfs2 /dev/block_device /dir | 
|  | 83 |  | 
|  | 84 | This will also invoke the cleaner through the mount helper program | 
|  | 85 | (mount.nilfs2). | 
|  | 86 |  | 
|  | 87 | Checkpoints and snapshots are managed by the following commands. | 
|  | 88 | Their manpages are included in the nilfs-utils package above. | 
|  | 89 |  | 
|  | 90 | lscp     list checkpoints or snapshots. | 
|  | 91 | mkcp     make a checkpoint or a snapshot. | 
|  | 92 | chcp     change an existing checkpoint to a snapshot or vice versa. | 
|  | 93 | rmcp     invalidate specified checkpoint(s). | 
|  | 94 |  | 
|  | 95 | To mount a snapshot, | 
|  | 96 |  | 
|  | 97 | # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir | 
|  | 98 |  | 
|  | 99 | where <cno> is the checkpoint number of the snapshot. | 
|  | 100 |  | 
|  | 101 | To unmount the NILFS2 mount point or snapshot, simply: | 
|  | 102 |  | 
|  | 103 | # umount /dir | 
|  | 104 |  | 
|  | 105 | Then, the cleaner daemon is automatically shut down by the umount | 
|  | 106 | helper program (umount.nilfs2). | 
|  | 107 |  | 
|  | 108 | Disk format | 
|  | 109 | =========== | 
|  | 110 |  | 
|  | 111 | A nilfs2 volume is equally divided into a number of segments except | 
|  | 112 | for the super block (SB) and segment #0.  A segment is the container | 
|  | 113 | of logs.  Each log is composed of summary information blocks, payload | 
|  | 114 | blocks, and an optional super root block (SR): | 
|  | 115 |  | 
|  | 116 | ______________________________________________________ | 
|  | 117 | | |SB| | Segment | Segment | Segment | ... | Segment | | | 
|  | 118 | |_|__|_|____0____|____1____|____2____|_____|____N____|_| | 
|  | 119 | 0 +1K +4K       +8M       +16M      +24M  +(8MB x N) | 
|  | 120 | .             .            (Typical offsets for 4KB-block) | 
|  | 121 | .                  . | 
|  | 122 | .______________________. | 
|  | 123 | | log | log |... | log | | 
|  | 124 | |__1__|__2__|____|__m__| | 
|  | 125 | .       . | 
|  | 126 | .               . | 
|  | 127 | .                       . | 
|  | 128 | .______________________________. | 
|  | 129 | | Summary | Payload blocks  |SR| | 
|  | 130 | |_blocks__|_________________|__| | 
|  | 131 |  | 
|  | 132 | The payload blocks are organized per file, and each file consists of | 
|  | 133 | data blocks and B-tree node blocks: | 
|  | 134 |  | 
|  | 135 | |<---       File-A        --->|<---       File-B        --->| | 
|  | 136 | _______________________________________________________________ | 
|  | 137 | | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ... | 
|  | 138 | _|_____________|_______________|_____________|_______________|_ | 
|  | 139 |  | 
|  | 140 |  | 
|  | 141 | Since only the modified blocks are written in the log, it may have | 
|  | 142 | files without data blocks or B-tree node blocks. | 
|  | 143 |  | 
|  | 144 | The organization of the blocks is recorded in the summary information | 
|  | 145 | blocks, which contains a header structure (nilfs_segment_summary), per | 
|  | 146 | file structures (nilfs_finfo), and per block structures (nilfs_binfo): | 
|  | 147 |  | 
|  | 148 | _________________________________________________________________________ | 
|  | 149 | | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |... | 
|  | 150 | |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___ | 
|  | 151 |  | 
|  | 152 |  | 
|  | 153 | The logs include regular files, directory files, symbolic link files | 
|  | 154 | and several meta data files.  The mata data files are the files used | 
|  | 155 | to maintain file system meta data.  The current version of NILFS2 uses | 
|  | 156 | the following meta data files: | 
|  | 157 |  | 
|  | 158 | 1) Inode file (ifile)             -- Stores on-disk inodes | 
|  | 159 | 2) Checkpoint file (cpfile)       -- Stores checkpoints | 
|  | 160 | 3) Segment usage file (sufile)    -- Stores allocation state of segments | 
|  | 161 | 4) Data address translation file  -- Maps virtual block numbers to usual | 
|  | 162 | (DAT)                             block numbers.  This file serves to | 
|  | 163 | make on-disk blocks relocatable. | 
| Ryusuke Konishi | 962281a | 2009-04-06 19:01:20 -0700 | [diff] [blame] | 164 |  | 
|  | 165 | The following figure shows a typical organization of the logs: | 
|  | 166 |  | 
|  | 167 | _________________________________________________________________________ | 
|  | 168 | | Summary | regular file | file  | ... | ifile | cpfile | sufile | DAT |SR| | 
|  | 169 | |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__| | 
|  | 170 |  | 
|  | 171 |  | 
|  | 172 | To stride over segment boundaries, this sequence of files may be split | 
|  | 173 | into multiple logs.  The sequence of logs that should be treated as | 
|  | 174 | logically one log, is delimited with flags marked in the segment | 
|  | 175 | summary.  The recovery code of nilfs2 looks this boundary information | 
|  | 176 | to ensure atomicity of updates. | 
|  | 177 |  | 
|  | 178 | The super root block is inserted for every checkpoints.  It includes | 
|  | 179 | three special inodes, inodes for the DAT, cpfile, and sufile.  Inodes | 
|  | 180 | of regular files, directories, symlinks and other special files, are | 
|  | 181 | included in the ifile.  The inode of ifile itself is included in the | 
|  | 182 | corresponding checkpoint entry in the cpfile.  Thus, the hierarchy | 
|  | 183 | among NILFS2 files can be depicted as follows: | 
|  | 184 |  | 
|  | 185 | Super block (SB) | 
|  | 186 | | | 
|  | 187 | v | 
|  | 188 | Super root block (the latest cno=xx) | 
|  | 189 | |-- DAT | 
|  | 190 | |-- sufile | 
|  | 191 | `-- cpfile | 
|  | 192 | |-- ifile (cno=c1) | 
|  | 193 | |-- ifile (cno=c2) ---- file (ino=i1) | 
|  | 194 | :        :          |-- file (ino=i2) | 
|  | 195 | `-- ifile (cno=xx)  |-- file (ino=i3) | 
|  | 196 | :        : | 
|  | 197 | `-- file (ino=yy) | 
|  | 198 | ( regular file, directory, or symlink ) | 
|  | 199 |  | 
|  | 200 | For detail on the format of each file, please see include/linux/nilfs2_fs.h. |