| Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 1 | pagemap, from the userspace perspective | 
 | 2 | --------------------------------------- | 
 | 3 |  | 
 | 4 | pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow | 
 | 5 | userspace programs to examine the page tables and related information by | 
 | 6 | reading files in /proc. | 
 | 7 |  | 
 | 8 | There are three components to pagemap: | 
 | 9 |  | 
 | 10 |  * /proc/pid/pagemap.  This file lets a userspace process find out which | 
 | 11 |    physical frame each virtual page is mapped to.  It contains one 64-bit | 
 | 12 |    value for each virtual page, containing the following data (from | 
 | 13 |    fs/proc/task_mmu.c, above pagemap_read): | 
 | 14 |  | 
| Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 15 |     * Bits 0-54  page frame number (PFN) if present | 
| Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 16 |     * Bits 0-4   swap type if swapped | 
| Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 17 |     * Bits 5-54  swap offset if swapped | 
| Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 18 |     * Bits 55-60 page shift (page size = 1<<page shift) | 
 | 19 |     * Bit  61    reserved for future use | 
 | 20 |     * Bit  62    page swapped | 
 | 21 |     * Bit  63    page present | 
 | 22 |  | 
 | 23 |    If the page is not present but in swap, then the PFN contains an | 
 | 24 |    encoding of the swap file number and the page's offset into the | 
 | 25 |    swap. Unmapped pages return a null PFN. This allows determining | 
 | 26 |    precisely which pages are mapped (or in swap) and comparing mapped | 
 | 27 |    pages between processes. | 
 | 28 |  | 
 | 29 |    Efficient users of this interface will use /proc/pid/maps to | 
 | 30 |    determine which areas of memory are actually mapped and llseek to | 
 | 31 |    skip over unmapped regions. | 
 | 32 |  | 
 | 33 |  * /proc/kpagecount.  This file contains a 64-bit count of the number of | 
 | 34 |    times each page is mapped, indexed by PFN. | 
 | 35 |  | 
 | 36 |  * /proc/kpageflags.  This file contains a 64-bit set of flags for each | 
 | 37 |    page, indexed by PFN. | 
 | 38 |  | 
| Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 39 |    The flags are (from fs/proc/page.c, above kpageflags_read): | 
| Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 40 |  | 
 | 41 |      0. LOCKED | 
 | 42 |      1. ERROR | 
 | 43 |      2. REFERENCED | 
 | 44 |      3. UPTODATE | 
 | 45 |      4. DIRTY | 
 | 46 |      5. LRU | 
 | 47 |      6. ACTIVE | 
 | 48 |      7. SLAB | 
 | 49 |      8. WRITEBACK | 
 | 50 |      9. RECLAIM | 
 | 51 |     10. BUDDY | 
| Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 52 |     11. MMAP | 
 | 53 |     12. ANON | 
 | 54 |     13. SWAPCACHE | 
 | 55 |     14. SWAPBACKED | 
 | 56 |     15. COMPOUND_HEAD | 
 | 57 |     16. COMPOUND_TAIL | 
 | 58 |     16. HUGE | 
 | 59 |     18. UNEVICTABLE | 
| Wu Fengguang | 253fb02 | 2009-10-07 16:32:27 -0700 | [diff] [blame] | 60 |     19. HWPOISON | 
| Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 61 |     20. NOPAGE | 
| Wu Fengguang | a1bbb5e | 2009-10-07 16:32:28 -0700 | [diff] [blame] | 62 |     21. KSM | 
| Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 63 |  | 
 | 64 | Short descriptions to the page flags: | 
 | 65 |  | 
 | 66 |  0. LOCKED | 
 | 67 |     page is being locked for exclusive access, eg. by undergoing read/write IO | 
 | 68 |  | 
 | 69 |  7. SLAB | 
 | 70 |     page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator | 
 | 71 |     When compound page is used, SLUB/SLQB will only set this flag on the head | 
 | 72 |     page; SLOB will not flag it at all. | 
 | 73 |  | 
 | 74 | 10. BUDDY | 
 | 75 |     a free memory block managed by the buddy system allocator | 
 | 76 |     The buddy system organizes free memory in blocks of various orders. | 
 | 77 |     An order N block has 2^N physically contiguous pages, with the BUDDY flag | 
 | 78 |     set for and _only_ for the first page. | 
 | 79 |  | 
 | 80 | 15. COMPOUND_HEAD | 
 | 81 | 16. COMPOUND_TAIL | 
 | 82 |     A compound page with order N consists of 2^N physically contiguous pages. | 
 | 83 |     A compound page with order 2 takes the form of "HTTT", where H donates its | 
 | 84 |     head page and T donates its tail page(s).  The major consumers of compound | 
 | 85 |     pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc. | 
 | 86 |     memory allocators and various device drivers. However in this interface, | 
 | 87 |     only huge/giga pages are made visible to end users. | 
 | 88 | 17. HUGE | 
 | 89 |     this is an integral part of a HugeTLB page | 
 | 90 |  | 
| Wu Fengguang | 253fb02 | 2009-10-07 16:32:27 -0700 | [diff] [blame] | 91 | 19. HWPOISON | 
 | 92 |     hardware detected memory corruption on this page: don't touch the data! | 
 | 93 |  | 
| Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 94 | 20. NOPAGE | 
 | 95 |     no page frame exists at the requested address | 
 | 96 |  | 
| Wu Fengguang | a1bbb5e | 2009-10-07 16:32:28 -0700 | [diff] [blame] | 97 | 21. KSM | 
 | 98 |     identical memory pages dynamically shared between one or more processes | 
 | 99 |  | 
| Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 100 |     [IO related page flags] | 
 | 101 |  1. ERROR     IO error occurred | 
 | 102 |  3. UPTODATE  page has up-to-date data | 
 | 103 |               ie. for file backed page: (in-memory data revision >= on-disk one) | 
 | 104 |  4. DIRTY     page has been written to, hence contains new data | 
 | 105 |               ie. for file backed page: (in-memory data revision >  on-disk one) | 
 | 106 |  8. WRITEBACK page is being synced to disk | 
 | 107 |  | 
 | 108 |     [LRU related page flags] | 
 | 109 |  5. LRU         page is in one of the LRU lists | 
 | 110 |  6. ACTIVE      page is in the active LRU list | 
 | 111 | 18. UNEVICTABLE page is in the unevictable (non-)LRU list | 
 | 112 |                 It is somehow pinned and not a candidate for LRU page reclaims, | 
 | 113 | 		eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments | 
 | 114 |  2. REFERENCED  page has been referenced since last LRU list enqueue/requeue | 
 | 115 |  9. RECLAIM     page will be reclaimed soon after its pageout IO completed | 
 | 116 | 11. MMAP        a memory mapped page | 
 | 117 | 12. ANON        a memory mapped page that is not part of a file | 
 | 118 | 13. SWAPCACHE   page is mapped to swap space, ie. has an associated swap entry | 
 | 119 | 14. SWAPBACKED  page is backed by swap/RAM | 
 | 120 |  | 
 | 121 | The page-types tool in this directory can be used to query the above flags. | 
| Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 122 |  | 
 | 123 | Using pagemap to do something useful: | 
 | 124 |  | 
 | 125 | The general procedure for using pagemap to find out about a process' memory | 
 | 126 | usage goes like this: | 
 | 127 |  | 
 | 128 |  1. Read /proc/pid/maps to determine which parts of the memory space are | 
 | 129 |     mapped to what. | 
 | 130 |  2. Select the maps you are interested in -- all of them, or a particular | 
 | 131 |     library, or the stack or the heap, etc. | 
 | 132 |  3. Open /proc/pid/pagemap and seek to the pages you would like to examine. | 
 | 133 |  4. Read a u64 for each page from pagemap. | 
 | 134 |  5. Open /proc/kpagecount and/or /proc/kpageflags.  For each PFN you just | 
 | 135 |     read, seek to that entry in the file, and read the data you want. | 
 | 136 |  | 
 | 137 | For example, to find the "unique set size" (USS), which is the amount of | 
 | 138 | memory that a process is using that is not shared with any other process, | 
 | 139 | you can go through every map in the process, find the PFNs, look those up | 
 | 140 | in kpagecount, and tally up the number of pages that are only referenced | 
 | 141 | once. | 
 | 142 |  | 
 | 143 | Other notes: | 
 | 144 |  | 
 | 145 | Reading from any of the files will return -EINVAL if you are not starting | 
 | 146 | the read on an 8-byte boundary (e.g., if you seeked an odd number of bytes | 
 | 147 | into the file), or if the size of the read is not a multiple of 8 bytes. |