| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 1 | Memory Resource Controller(Memcg)  Implementation Memo. | 
| Daisuke Nishimura | 1080d7a | 2010-03-10 15:22:31 -0800 | [diff] [blame] | 2 | Last Updated: 2010/2 | 
|  | 3 | Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34). | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 4 |  | 
|  | 5 | Because VM is getting complex (one of reasons is memcg...), memcg's behavior | 
|  | 6 | is complex. This is a document for memcg's internal behavior. | 
|  | 7 | Please note that implementation details can be changed. | 
|  | 8 |  | 
| Li Zefan | 45ce80f | 2009-01-15 13:50:59 -0800 | [diff] [blame] | 9 | (*) Topics on API should be in Documentation/cgroups/memory.txt) | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 10 |  | 
|  | 11 | 0. How to record usage ? | 
|  | 12 | 2 objects are used. | 
|  | 13 |  | 
|  | 14 | page_cgroup ....an object per page. | 
|  | 15 | Allocated at boot or memory hotplug. Freed at memory hot removal. | 
|  | 16 |  | 
|  | 17 | swap_cgroup ... an entry per swp_entry. | 
|  | 18 | Allocated at swapon(). Freed at swapoff(). | 
|  | 19 |  | 
|  | 20 | The page_cgroup has USED bit and double count against a page_cgroup never | 
|  | 21 | occurs. swap_cgroup is used only when a charged page is swapped-out. | 
|  | 22 |  | 
|  | 23 | 1. Charge | 
|  | 24 |  | 
|  | 25 | a page/swp_entry may be charged (usage += PAGE_SIZE) at | 
|  | 26 |  | 
|  | 27 | mem_cgroup_newpage_charge() | 
|  | 28 | Called at new page fault and Copy-On-Write. | 
|  | 29 |  | 
|  | 30 | mem_cgroup_try_charge_swapin() | 
|  | 31 | Called at do_swap_page() (page fault on swap entry) and swapoff. | 
|  | 32 | Followed by charge-commit-cancel protocol. (With swap accounting) | 
|  | 33 | At commit, a charge recorded in swap_cgroup is removed. | 
|  | 34 |  | 
|  | 35 | mem_cgroup_cache_charge() | 
|  | 36 | Called at add_to_page_cache() | 
|  | 37 |  | 
|  | 38 | mem_cgroup_cache_charge_swapin() | 
|  | 39 | Called at shmem's swapin. | 
|  | 40 |  | 
|  | 41 | mem_cgroup_prepare_migration() | 
|  | 42 | Called before migration. "extra" charge is done and followed by | 
|  | 43 | charge-commit-cancel protocol. | 
|  | 44 | At commit, charge against oldpage or newpage will be committed. | 
|  | 45 |  | 
|  | 46 | 2. Uncharge | 
|  | 47 | a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by | 
|  | 48 |  | 
|  | 49 | mem_cgroup_uncharge_page() | 
|  | 50 | Called when an anonymous page is fully unmapped. I.e., mapcount goes | 
|  | 51 | to 0. If the page is SwapCache, uncharge is delayed until | 
|  | 52 | mem_cgroup_uncharge_swapcache(). | 
|  | 53 |  | 
|  | 54 | mem_cgroup_uncharge_cache_page() | 
|  | 55 | Called when a page-cache is deleted from radix-tree. If the page is | 
|  | 56 | SwapCache, uncharge is delayed until mem_cgroup_uncharge_swapcache(). | 
|  | 57 |  | 
|  | 58 | mem_cgroup_uncharge_swapcache() | 
|  | 59 | Called when SwapCache is removed from radix-tree. The charge itself | 
|  | 60 | is moved to swap_cgroup. (If mem+swap controller is disabled, no | 
|  | 61 | charge to swap occurs.) | 
|  | 62 |  | 
|  | 63 | mem_cgroup_uncharge_swap() | 
|  | 64 | Called when swp_entry's refcnt goes down to 0. A charge against swap | 
|  | 65 | disappears. | 
|  | 66 |  | 
|  | 67 | mem_cgroup_end_migration(old, new) | 
|  | 68 | At success of migration old is uncharged (if necessary), a charge | 
|  | 69 | to new page is committed. At failure, charge to old page is committed. | 
|  | 70 |  | 
|  | 71 | 3. charge-commit-cancel | 
|  | 72 | In some case, we can't know this "charge" is valid or not at charging | 
|  | 73 | (because of races). | 
|  | 74 | To handle such case, there are charge-commit-cancel functions. | 
|  | 75 | mem_cgroup_try_charge_XXX | 
|  | 76 | mem_cgroup_commit_charge_XXX | 
|  | 77 | mem_cgroup_cancel_charge_XXX | 
|  | 78 | these are used in swap-in and migration. | 
|  | 79 |  | 
|  | 80 | At try_charge(), there are no flags to say "this page is charged". | 
|  | 81 | at this point, usage += PAGE_SIZE. | 
|  | 82 |  | 
|  | 83 | At commit(), the function checks the page should be charged or not | 
|  | 84 | and set flags or avoid charging.(usage -= PAGE_SIZE) | 
|  | 85 |  | 
|  | 86 | At cancel(), simply usage -= PAGE_SIZE. | 
|  | 87 |  | 
|  | 88 | Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. | 
|  | 89 |  | 
|  | 90 | 4. Anonymous | 
|  | 91 | Anonymous page is newly allocated at | 
|  | 92 | - page fault into MAP_ANONYMOUS mapping. | 
|  | 93 | - Copy-On-Write. | 
|  | 94 | It is charged right after it's allocated before doing any page table | 
|  | 95 | related operations. Of course, it's uncharged when another page is used | 
|  | 96 | for the fault address. | 
|  | 97 |  | 
|  | 98 | At freeing anonymous page (by exit() or munmap()), zap_pte() is called | 
|  | 99 | and pages for ptes are freed one by one.(see mm/memory.c). Uncharges | 
|  | 100 | are done at page_remove_rmap() when page_mapcount() goes down to 0. | 
|  | 101 |  | 
|  | 102 | Another page freeing is by page-reclaim (vmscan.c) and anonymous | 
|  | 103 | pages are swapped out. In this case, the page is marked as | 
|  | 104 | PageSwapCache(). uncharge() routine doesn't uncharge the page marked | 
|  | 105 | as SwapCache(). It's delayed until __delete_from_swap_cache(). | 
|  | 106 |  | 
|  | 107 | 4.1 Swap-in. | 
|  | 108 | At swap-in, the page is taken from swap-cache. There are 2 cases. | 
|  | 109 |  | 
|  | 110 | (a) If the SwapCache is newly allocated and read, it has no charges. | 
|  | 111 | (b) If the SwapCache has been mapped by processes, it has been | 
|  | 112 | charged already. | 
|  | 113 |  | 
| KAMEZAWA Hiroyuki | 03f3c43 | 2009-01-07 18:08:31 -0800 | [diff] [blame] | 114 | This swap-in is one of the most complicated work. In do_swap_page(), | 
|  | 115 | following events occur when pte is unchanged. | 
|  | 116 |  | 
|  | 117 | (1) the page (SwapCache) is looked up. | 
|  | 118 | (2) lock_page() | 
|  | 119 | (3) try_charge_swapin() | 
|  | 120 | (4) reuse_swap_page() (may call delete_swap_cache()) | 
|  | 121 | (5) commit_charge_swapin() | 
|  | 122 | (6) swap_free(). | 
|  | 123 |  | 
|  | 124 | Considering following situation for example. | 
|  | 125 |  | 
|  | 126 | (A) The page has not been charged before (2) and reuse_swap_page() | 
|  | 127 | doesn't call delete_from_swap_cache(). | 
|  | 128 | (B) The page has not been charged before (2) and reuse_swap_page() | 
|  | 129 | calls delete_from_swap_cache(). | 
|  | 130 | (C) The page has been charged before (2) and reuse_swap_page() doesn't | 
|  | 131 | call delete_from_swap_cache(). | 
|  | 132 | (D) The page has been charged before (2) and reuse_swap_page() calls | 
|  | 133 | delete_from_swap_cache(). | 
|  | 134 |  | 
|  | 135 | memory.usage/memsw.usage changes to this page/swp_entry will be | 
|  | 136 | Case          (A)      (B)       (C)     (D) | 
|  | 137 | Event | 
|  | 138 | Before (2)     0/ 1     0/ 1      1/ 1    1/ 1 | 
|  | 139 | =========================================== | 
|  | 140 | (3)        +1/+1    +1/+1     +1/+1   +1/+1 | 
|  | 141 | (4)          -       0/ 0       -     -1/ 0 | 
|  | 142 | (5)         0/-1     0/ 0     -1/-1    0/ 0 | 
|  | 143 | (6)          -       0/-1       -      0/-1 | 
|  | 144 | =========================================== | 
|  | 145 | Result         1/ 1     1/ 1      1/ 1    1/ 1 | 
|  | 146 |  | 
|  | 147 | In any cases, charges to this page should be 1/ 1. | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 148 |  | 
|  | 149 | 4.2 Swap-out. | 
|  | 150 | At swap-out, typical state transition is below. | 
|  | 151 |  | 
|  | 152 | (a) add to swap cache. (marked as SwapCache) | 
|  | 153 | swp_entry's refcnt += 1. | 
|  | 154 | (b) fully unmapped. | 
|  | 155 | swp_entry's refcnt += # of ptes. | 
|  | 156 | (c) write back to swap. | 
|  | 157 | (d) delete from swap cache. (remove from SwapCache) | 
|  | 158 | swp_entry's refcnt -= 1. | 
|  | 159 |  | 
|  | 160 |  | 
|  | 161 | At (b), the page is marked as SwapCache and not uncharged. | 
|  | 162 | At (d), the page is removed from SwapCache and a charge in page_cgroup | 
|  | 163 | is moved to swap_cgroup. | 
|  | 164 |  | 
|  | 165 | Finally, at task exit, | 
|  | 166 | (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0. | 
|  | 167 | Here, a charge in swap_cgroup disappears. | 
|  | 168 |  | 
|  | 169 | 5. Page Cache | 
|  | 170 | Page Cache is charged at | 
|  | 171 | - add_to_page_cache_locked(). | 
|  | 172 |  | 
|  | 173 | uncharged at | 
|  | 174 | - __remove_from_page_cache(). | 
|  | 175 |  | 
|  | 176 | The logic is very clear. (About migration, see below) | 
|  | 177 | Note: __remove_from_page_cache() is called by remove_from_page_cache() | 
|  | 178 | and __remove_mapping(). | 
|  | 179 |  | 
|  | 180 | 6. Shmem(tmpfs) Page Cache | 
|  | 181 | Memcg's charge/uncharge have special handlers of shmem. The best way | 
|  | 182 | to understand shmem's page state transition is to read mm/shmem.c. | 
|  | 183 | But brief explanation of the behavior of memcg around shmem will be | 
|  | 184 | helpful to understand the logic. | 
|  | 185 |  | 
|  | 186 | Shmem's page (just leaf page, not direct/indirect block) can be on | 
|  | 187 | - radix-tree of shmem's inode. | 
|  | 188 | - SwapCache. | 
|  | 189 | - Both on radix-tree and SwapCache. This happens at swap-in | 
|  | 190 | and swap-out, | 
|  | 191 |  | 
|  | 192 | It's charged when... | 
|  | 193 | - A new page is added to shmem's radix-tree. | 
|  | 194 | - A swp page is read. (move a charge from swap_cgroup to page_cgroup) | 
|  | 195 | It's uncharged when | 
|  | 196 | - A page is removed from radix-tree and not SwapCache. | 
|  | 197 | - When SwapCache is removed, a charge is moved to swap_cgroup. | 
|  | 198 | - When swp_entry's refcnt goes down to 0, a charge in swap_cgroup | 
|  | 199 | disappears. | 
|  | 200 |  | 
|  | 201 | 7. Page Migration | 
|  | 202 | One of the most complicated functions is page-migration-handler. | 
|  | 203 | Memcg has 2 routines. Assume that we are migrating a page's contents | 
|  | 204 | from OLDPAGE to NEWPAGE. | 
|  | 205 |  | 
|  | 206 | Usual migration logic is.. | 
|  | 207 | (a) remove the page from LRU. | 
|  | 208 | (b) allocate NEWPAGE (migration target) | 
|  | 209 | (c) lock by lock_page(). | 
|  | 210 | (d) unmap all mappings. | 
|  | 211 | (e-1) If necessary, replace entry in radix-tree. | 
|  | 212 | (e-2) move contents of a page. | 
|  | 213 | (f) map all mappings again. | 
|  | 214 | (g) pushback the page to LRU. | 
|  | 215 | (-) OLDPAGE will be freed. | 
|  | 216 |  | 
|  | 217 | Before (g), memcg should complete all necessary charge/uncharge to | 
|  | 218 | NEWPAGE/OLDPAGE. | 
|  | 219 |  | 
|  | 220 | The point is.... | 
|  | 221 | - If OLDPAGE is anonymous, all charges will be dropped at (d) because | 
|  | 222 | try_to_unmap() drops all mapcount and the page will not be | 
|  | 223 | SwapCache. | 
|  | 224 |  | 
|  | 225 | - If OLDPAGE is SwapCache, charges will be kept at (g) because | 
|  | 226 | __delete_from_swap_cache() isn't called at (e-1) | 
|  | 227 |  | 
|  | 228 | - If OLDPAGE is page-cache, charges will be kept at (g) because | 
|  | 229 | remove_from_swap_cache() isn't called at (e-1) | 
|  | 230 |  | 
|  | 231 | memcg provides following hooks. | 
|  | 232 |  | 
|  | 233 | - mem_cgroup_prepare_migration(OLDPAGE) | 
|  | 234 | Called after (b) to account a charge (usage += PAGE_SIZE) against | 
|  | 235 | memcg which OLDPAGE belongs to. | 
|  | 236 |  | 
|  | 237 | - mem_cgroup_end_migration(OLDPAGE, NEWPAGE) | 
|  | 238 | Called after (f) before (g). | 
|  | 239 | If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already | 
|  | 240 | charged, a charge by prepare_migration() is automatically canceled. | 
|  | 241 | If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE. | 
|  | 242 |  | 
|  | 243 | But zap_pte() (by exit or munmap) can be called while migration, | 
|  | 244 | we have to check if OLDPAGE/NEWPAGE is a valid page after commit(). | 
|  | 245 |  | 
|  | 246 | 8. LRU | 
| Francis Galiegue | a33f322 | 2010-04-23 00:08:02 +0200 | [diff] [blame] | 247 | Each memcg has its own private LRU. Now, its handling is under global | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 248 | VM's control (means that it's handled under global zone->lru_lock). | 
|  | 249 | Almost all routines around memcg's LRU is called by global LRU's | 
|  | 250 | list management functions under zone->lru_lock(). | 
|  | 251 |  | 
|  | 252 | A special function is mem_cgroup_isolate_pages(). This scans | 
|  | 253 | memcg's private LRU and call __isolate_lru_page() to extract a page | 
|  | 254 | from LRU. | 
|  | 255 | (By __isolate_lru_page(), the page is removed from both of global and | 
|  | 256 | private LRU.) | 
|  | 257 |  | 
|  | 258 |  | 
|  | 259 | 9. Typical Tests. | 
|  | 260 |  | 
|  | 261 | Tests for racy cases. | 
|  | 262 |  | 
|  | 263 | 9.1 Small limit to memcg. | 
|  | 264 | When you do test to do racy case, it's good test to set memcg's limit | 
|  | 265 | to be very small rather than GB. Many races found in the test under | 
|  | 266 | xKB or xxMB limits. | 
|  | 267 | (Memory behavior under GB and Memory behavior under MB shows very | 
|  | 268 | different situation.) | 
|  | 269 |  | 
|  | 270 | 9.2 Shmem | 
|  | 271 | Historically, memcg's shmem handling was poor and we saw some amount | 
|  | 272 | of troubles here. This is because shmem is page-cache but can be | 
|  | 273 | SwapCache. Test with shmem/tmpfs is always good test. | 
|  | 274 |  | 
|  | 275 | 9.3 Migration | 
|  | 276 | For NUMA, migration is an another special case. To do easy test, cpuset | 
|  | 277 | is useful. Following is a sample script to do migration. | 
|  | 278 |  | 
|  | 279 | mount -t cgroup -o cpuset none /opt/cpuset | 
|  | 280 |  | 
|  | 281 | mkdir /opt/cpuset/01 | 
|  | 282 | echo 1 > /opt/cpuset/01/cpuset.cpus | 
|  | 283 | echo 0 > /opt/cpuset/01/cpuset.mems | 
|  | 284 | echo 1 > /opt/cpuset/01/cpuset.memory_migrate | 
|  | 285 | mkdir /opt/cpuset/02 | 
|  | 286 | echo 1 > /opt/cpuset/02/cpuset.cpus | 
|  | 287 | echo 1 > /opt/cpuset/02/cpuset.mems | 
|  | 288 | echo 1 > /opt/cpuset/02/cpuset.memory_migrate | 
|  | 289 |  | 
|  | 290 | In above set, when you moves a task from 01 to 02, page migration to | 
|  | 291 | node 0 to node 1 will occur. Following is a script to migrate all | 
|  | 292 | under cpuset. | 
|  | 293 | -- | 
|  | 294 | move_task() | 
|  | 295 | { | 
|  | 296 | for pid in $1 | 
|  | 297 | do | 
|  | 298 | /bin/echo $pid >$2/tasks 2>/dev/null | 
|  | 299 | echo -n $pid | 
|  | 300 | echo -n " " | 
|  | 301 | done | 
|  | 302 | echo END | 
|  | 303 | } | 
|  | 304 |  | 
|  | 305 | G1_TASK=`cat ${G1}/tasks` | 
|  | 306 | G2_TASK=`cat ${G2}/tasks` | 
|  | 307 | move_task "${G1_TASK}" ${G2} & | 
|  | 308 | -- | 
|  | 309 | 9.4 Memory hotplug. | 
|  | 310 | memory hotplug test is one of good test. | 
|  | 311 | to offline memory, do following. | 
|  | 312 | # echo offline > /sys/devices/system/memory/memoryXXX/state | 
|  | 313 | (XXX is the place of memory) | 
|  | 314 | This is an easy way to test page migration, too. | 
|  | 315 |  | 
|  | 316 | 9.5 mkdir/rmdir | 
|  | 317 | When using hierarchy, mkdir/rmdir test should be done. | 
|  | 318 | Use tests like the following. | 
|  | 319 |  | 
|  | 320 | echo 1 >/opt/cgroup/01/memory/use_hierarchy | 
|  | 321 | mkdir /opt/cgroup/01/child_a | 
|  | 322 | mkdir /opt/cgroup/01/child_b | 
|  | 323 |  | 
|  | 324 | set limit to 01. | 
|  | 325 | add limit to 01/child_b | 
|  | 326 | run jobs under child_a and child_b | 
|  | 327 |  | 
|  | 328 | create/delete following groups at random while jobs are running. | 
|  | 329 | /opt/cgroup/01/child_a/child_aa | 
|  | 330 | /opt/cgroup/01/child_b/child_bb | 
|  | 331 | /opt/cgroup/01/child_c | 
|  | 332 |  | 
|  | 333 | running new jobs in new group is also good. | 
|  | 334 |  | 
|  | 335 | 9.6 Mount with other subsystems. | 
|  | 336 | Mounting with other subsystems is a good test because there is a | 
|  | 337 | race and lock dependency with other cgroup subsystems. | 
|  | 338 |  | 
|  | 339 | example) | 
| Kirill A. Shutemov | 0263c12 | 2010-03-10 15:22:37 -0800 | [diff] [blame] | 340 | # mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 341 |  | 
|  | 342 | and do task move, mkdir, rmdir etc...under this. | 
| KAMEZAWA Hiroyuki | 8d50d36 | 2009-01-29 14:25:14 -0800 | [diff] [blame] | 343 |  | 
|  | 344 | 9.7 swapoff. | 
|  | 345 | Besides management of swap is one of complicated parts of memcg, | 
|  | 346 | call path of swap-in at swapoff is not same as usual swap-in path.. | 
|  | 347 | It's worth to be tested explicitly. | 
|  | 348 |  | 
|  | 349 | For example, test like following is good. | 
|  | 350 | (Shell-A) | 
| Kirill A. Shutemov | 0263c12 | 2010-03-10 15:22:37 -0800 | [diff] [blame] | 351 | # mount -t cgroup none /cgroup -o memory | 
| KAMEZAWA Hiroyuki | 8d50d36 | 2009-01-29 14:25:14 -0800 | [diff] [blame] | 352 | # mkdir /cgroup/test | 
|  | 353 | # echo 40M > /cgroup/test/memory.limit_in_bytes | 
|  | 354 | # echo 0 > /cgroup/test/tasks | 
|  | 355 | Run malloc(100M) program under this. You'll see 60M of swaps. | 
|  | 356 | (Shell-B) | 
|  | 357 | # move all tasks in /cgroup/test to /cgroup | 
|  | 358 | # /sbin/swapoff -a | 
| Thadeu Lima de Souza Cascardo | 6d5e147 | 2009-02-03 11:57:13 +0100 | [diff] [blame] | 359 | # rmdir /cgroup/test | 
| KAMEZAWA Hiroyuki | 8d50d36 | 2009-01-29 14:25:14 -0800 | [diff] [blame] | 360 | # kill malloc task. | 
|  | 361 |  | 
|  | 362 | Of course, tmpfs v.s. swapoff test should be tested, too. | 
| KAMEZAWA Hiroyuki | 0b7f569 | 2009-04-02 16:57:38 -0700 | [diff] [blame] | 363 |  | 
|  | 364 | 9.8 OOM-Killer | 
|  | 365 | Out-of-memory caused by memcg's limit will kill tasks under | 
|  | 366 | the memcg. When hierarchy is used, a task under hierarchy | 
|  | 367 | will be killed by the kernel. | 
|  | 368 | In this case, panic_on_oom shouldn't be invoked and tasks | 
|  | 369 | in other groups shouldn't be killed. | 
|  | 370 |  | 
|  | 371 | It's not difficult to cause OOM under memcg as following. | 
|  | 372 | Case A) when you can swapoff | 
|  | 373 | #swapoff -a | 
|  | 374 | #echo 50M > /memory.limit_in_bytes | 
|  | 375 | run 51M of malloc | 
|  | 376 |  | 
|  | 377 | Case B) when you use mem+swap limitation. | 
|  | 378 | #echo 50M > memory.limit_in_bytes | 
|  | 379 | #echo 50M > memory.memsw.limit_in_bytes | 
|  | 380 | run 51M of malloc | 
| Daisuke Nishimura | 1080d7a | 2010-03-10 15:22:31 -0800 | [diff] [blame] | 381 |  | 
|  | 382 | 9.9 Move charges at task migration | 
|  | 383 | Charges associated with a task can be moved along with task migration. | 
|  | 384 |  | 
|  | 385 | (Shell-A) | 
|  | 386 | #mkdir /cgroup/A | 
|  | 387 | #echo $$ >/cgroup/A/tasks | 
|  | 388 | run some programs which uses some amount of memory in /cgroup/A. | 
|  | 389 |  | 
|  | 390 | (Shell-B) | 
|  | 391 | #mkdir /cgroup/B | 
|  | 392 | #echo 1 >/cgroup/B/memory.move_charge_at_immigrate | 
|  | 393 | #echo "pid of the program running in group A" >/cgroup/B/tasks | 
|  | 394 |  | 
|  | 395 | You can see charges have been moved by reading *.usage_in_bytes or | 
|  | 396 | memory.stat of both A and B. | 
|  | 397 | See 8.2 of Documentation/cgroups/memory.txt to see what value should be | 
|  | 398 | written to move_charge_at_immigrate. | 
| Kirill A. Shutemov | 1e11145 | 2010-03-10 15:22:36 -0800 | [diff] [blame] | 399 |  | 
|  | 400 | 9.10 Memory thresholds | 
| Uwe Kleine-König | b595076 | 2010-11-01 15:38:34 -0400 | [diff] [blame] | 401 | Memory controller implements memory thresholds using cgroups notification | 
| Kirill A. Shutemov | 1e11145 | 2010-03-10 15:22:36 -0800 | [diff] [blame] | 402 | API. You can use Documentation/cgroups/cgroup_event_listener.c to test | 
|  | 403 | it. | 
|  | 404 |  | 
|  | 405 | (Shell-A) Create cgroup and run event listener | 
|  | 406 | # mkdir /cgroup/A | 
|  | 407 | # ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M | 
|  | 408 |  | 
|  | 409 | (Shell-B) Add task to cgroup and try to allocate and free memory | 
|  | 410 | # echo $$ >/cgroup/A/tasks | 
|  | 411 | # a="$(dd if=/dev/zero bs=1M count=10)" | 
|  | 412 | # a= | 
|  | 413 |  | 
|  | 414 | You will see message from cgroup_event_listener every time you cross | 
|  | 415 | the thresholds. | 
|  | 416 |  | 
|  | 417 | Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds. | 
|  | 418 |  | 
|  | 419 | It's good idea to test root cgroup as well. |