| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 1 | Memory Resource Controller(Memcg)  Implementation Memo. | 
| Daisuke Nishimura | 1080d7a | 2010-03-10 15:22:31 -0800 | [diff] [blame] | 2 | Last Updated: 2010/2 | 
 | 3 | Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34). | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 4 |  | 
 | 5 | Because VM is getting complex (one of reasons is memcg...), memcg's behavior | 
 | 6 | is complex. This is a document for memcg's internal behavior. | 
 | 7 | Please note that implementation details can be changed. | 
 | 8 |  | 
| Li Zefan | 45ce80f | 2009-01-15 13:50:59 -0800 | [diff] [blame] | 9 | (*) Topics on API should be in Documentation/cgroups/memory.txt) | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 10 |  | 
 | 11 | 0. How to record usage ? | 
 | 12 |    2 objects are used. | 
 | 13 |  | 
 | 14 |    page_cgroup ....an object per page. | 
 | 15 | 	Allocated at boot or memory hotplug. Freed at memory hot removal. | 
 | 16 |  | 
 | 17 |    swap_cgroup ... an entry per swp_entry. | 
 | 18 | 	Allocated at swapon(). Freed at swapoff(). | 
 | 19 |  | 
 | 20 |    The page_cgroup has USED bit and double count against a page_cgroup never | 
 | 21 |    occurs. swap_cgroup is used only when a charged page is swapped-out. | 
 | 22 |  | 
 | 23 | 1. Charge | 
 | 24 |  | 
 | 25 |    a page/swp_entry may be charged (usage += PAGE_SIZE) at | 
 | 26 |  | 
 | 27 | 	mem_cgroup_newpage_charge() | 
 | 28 | 	  Called at new page fault and Copy-On-Write. | 
 | 29 |  | 
 | 30 | 	mem_cgroup_try_charge_swapin() | 
 | 31 | 	  Called at do_swap_page() (page fault on swap entry) and swapoff. | 
 | 32 | 	  Followed by charge-commit-cancel protocol. (With swap accounting) | 
 | 33 | 	  At commit, a charge recorded in swap_cgroup is removed. | 
 | 34 |  | 
 | 35 | 	mem_cgroup_cache_charge() | 
 | 36 | 	  Called at add_to_page_cache() | 
 | 37 |  | 
 | 38 | 	mem_cgroup_cache_charge_swapin() | 
 | 39 | 	  Called at shmem's swapin. | 
 | 40 |  | 
 | 41 | 	mem_cgroup_prepare_migration() | 
 | 42 | 	  Called before migration. "extra" charge is done and followed by | 
 | 43 | 	  charge-commit-cancel protocol. | 
 | 44 | 	  At commit, charge against oldpage or newpage will be committed. | 
 | 45 |  | 
 | 46 | 2. Uncharge | 
 | 47 |   a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by | 
 | 48 |  | 
 | 49 | 	mem_cgroup_uncharge_page() | 
 | 50 | 	  Called when an anonymous page is fully unmapped. I.e., mapcount goes | 
 | 51 | 	  to 0. If the page is SwapCache, uncharge is delayed until | 
 | 52 | 	  mem_cgroup_uncharge_swapcache(). | 
 | 53 |  | 
 | 54 | 	mem_cgroup_uncharge_cache_page() | 
 | 55 | 	  Called when a page-cache is deleted from radix-tree. If the page is | 
 | 56 | 	  SwapCache, uncharge is delayed until mem_cgroup_uncharge_swapcache(). | 
 | 57 |  | 
 | 58 | 	mem_cgroup_uncharge_swapcache() | 
 | 59 | 	  Called when SwapCache is removed from radix-tree. The charge itself | 
 | 60 | 	  is moved to swap_cgroup. (If mem+swap controller is disabled, no | 
 | 61 | 	  charge to swap occurs.) | 
 | 62 |  | 
 | 63 | 	mem_cgroup_uncharge_swap() | 
 | 64 | 	  Called when swp_entry's refcnt goes down to 0. A charge against swap | 
 | 65 | 	  disappears. | 
 | 66 |  | 
 | 67 | 	mem_cgroup_end_migration(old, new) | 
 | 68 | 	At success of migration old is uncharged (if necessary), a charge | 
 | 69 | 	to new page is committed. At failure, charge to old page is committed. | 
 | 70 |  | 
 | 71 | 3. charge-commit-cancel | 
 | 72 | 	In some case, we can't know this "charge" is valid or not at charging | 
 | 73 | 	(because of races). | 
 | 74 | 	To handle such case, there are charge-commit-cancel functions. | 
 | 75 | 		mem_cgroup_try_charge_XXX | 
 | 76 | 		mem_cgroup_commit_charge_XXX | 
 | 77 | 		mem_cgroup_cancel_charge_XXX | 
 | 78 | 	these are used in swap-in and migration. | 
 | 79 |  | 
 | 80 | 	At try_charge(), there are no flags to say "this page is charged". | 
 | 81 | 	at this point, usage += PAGE_SIZE. | 
 | 82 |  | 
 | 83 | 	At commit(), the function checks the page should be charged or not | 
 | 84 | 	and set flags or avoid charging.(usage -= PAGE_SIZE) | 
 | 85 |  | 
 | 86 | 	At cancel(), simply usage -= PAGE_SIZE. | 
 | 87 |  | 
 | 88 | Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. | 
 | 89 |  | 
 | 90 | 4. Anonymous | 
 | 91 | 	Anonymous page is newly allocated at | 
 | 92 | 		  - page fault into MAP_ANONYMOUS mapping. | 
 | 93 | 		  - Copy-On-Write. | 
 | 94 |  	It is charged right after it's allocated before doing any page table | 
 | 95 | 	related operations. Of course, it's uncharged when another page is used | 
 | 96 | 	for the fault address. | 
 | 97 |  | 
 | 98 | 	At freeing anonymous page (by exit() or munmap()), zap_pte() is called | 
 | 99 | 	and pages for ptes are freed one by one.(see mm/memory.c). Uncharges | 
 | 100 | 	are done at page_remove_rmap() when page_mapcount() goes down to 0. | 
 | 101 |  | 
 | 102 | 	Another page freeing is by page-reclaim (vmscan.c) and anonymous | 
 | 103 | 	pages are swapped out. In this case, the page is marked as | 
 | 104 | 	PageSwapCache(). uncharge() routine doesn't uncharge the page marked | 
 | 105 | 	as SwapCache(). It's delayed until __delete_from_swap_cache(). | 
 | 106 |  | 
 | 107 | 	4.1 Swap-in. | 
 | 108 | 	At swap-in, the page is taken from swap-cache. There are 2 cases. | 
 | 109 |  | 
 | 110 | 	(a) If the SwapCache is newly allocated and read, it has no charges. | 
 | 111 | 	(b) If the SwapCache has been mapped by processes, it has been | 
 | 112 | 	    charged already. | 
 | 113 |  | 
| KAMEZAWA Hiroyuki | 03f3c43 | 2009-01-07 18:08:31 -0800 | [diff] [blame] | 114 | 	This swap-in is one of the most complicated work. In do_swap_page(), | 
 | 115 | 	following events occur when pte is unchanged. | 
 | 116 |  | 
 | 117 | 	(1) the page (SwapCache) is looked up. | 
 | 118 | 	(2) lock_page() | 
 | 119 | 	(3) try_charge_swapin() | 
 | 120 | 	(4) reuse_swap_page() (may call delete_swap_cache()) | 
 | 121 | 	(5) commit_charge_swapin() | 
 | 122 | 	(6) swap_free(). | 
 | 123 |  | 
 | 124 | 	Considering following situation for example. | 
 | 125 |  | 
 | 126 | 	(A) The page has not been charged before (2) and reuse_swap_page() | 
 | 127 | 	    doesn't call delete_from_swap_cache(). | 
 | 128 | 	(B) The page has not been charged before (2) and reuse_swap_page() | 
 | 129 | 	    calls delete_from_swap_cache(). | 
 | 130 | 	(C) The page has been charged before (2) and reuse_swap_page() doesn't | 
 | 131 | 	    call delete_from_swap_cache(). | 
 | 132 | 	(D) The page has been charged before (2) and reuse_swap_page() calls | 
 | 133 | 	    delete_from_swap_cache(). | 
 | 134 |  | 
 | 135 | 	    memory.usage/memsw.usage changes to this page/swp_entry will be | 
 | 136 | 	 Case          (A)      (B)       (C)     (D) | 
 | 137 |          Event | 
 | 138 |        Before (2)     0/ 1     0/ 1      1/ 1    1/ 1 | 
 | 139 |           =========================================== | 
 | 140 |           (3)        +1/+1    +1/+1     +1/+1   +1/+1 | 
 | 141 |           (4)          -       0/ 0       -     -1/ 0 | 
 | 142 |           (5)         0/-1     0/ 0     -1/-1    0/ 0 | 
 | 143 |           (6)          -       0/-1       -      0/-1 | 
 | 144 |           =========================================== | 
 | 145 |        Result         1/ 1     1/ 1      1/ 1    1/ 1 | 
 | 146 |  | 
 | 147 |        In any cases, charges to this page should be 1/ 1. | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 148 |  | 
 | 149 | 	4.2 Swap-out. | 
 | 150 | 	At swap-out, typical state transition is below. | 
 | 151 |  | 
 | 152 | 	(a) add to swap cache. (marked as SwapCache) | 
 | 153 | 	    swp_entry's refcnt += 1. | 
 | 154 | 	(b) fully unmapped. | 
 | 155 | 	    swp_entry's refcnt += # of ptes. | 
 | 156 | 	(c) write back to swap. | 
 | 157 | 	(d) delete from swap cache. (remove from SwapCache) | 
 | 158 | 	    swp_entry's refcnt -= 1. | 
 | 159 |  | 
 | 160 |  | 
 | 161 | 	At (b), the page is marked as SwapCache and not uncharged. | 
 | 162 | 	At (d), the page is removed from SwapCache and a charge in page_cgroup | 
 | 163 | 	is moved to swap_cgroup. | 
 | 164 |  | 
 | 165 | 	Finally, at task exit, | 
 | 166 | 	(e) zap_pte() is called and swp_entry's refcnt -=1 -> 0. | 
 | 167 | 	Here, a charge in swap_cgroup disappears. | 
 | 168 |  | 
 | 169 | 5. Page Cache | 
 | 170 |    	Page Cache is charged at | 
 | 171 | 	- add_to_page_cache_locked(). | 
 | 172 |  | 
 | 173 | 	uncharged at | 
 | 174 | 	- __remove_from_page_cache(). | 
 | 175 |  | 
 | 176 | 	The logic is very clear. (About migration, see below) | 
 | 177 | 	Note: __remove_from_page_cache() is called by remove_from_page_cache() | 
 | 178 | 	and __remove_mapping(). | 
 | 179 |  | 
 | 180 | 6. Shmem(tmpfs) Page Cache | 
 | 181 | 	Memcg's charge/uncharge have special handlers of shmem. The best way | 
 | 182 | 	to understand shmem's page state transition is to read mm/shmem.c. | 
 | 183 | 	But brief explanation of the behavior of memcg around shmem will be | 
 | 184 | 	helpful to understand the logic. | 
 | 185 |  | 
 | 186 | 	Shmem's page (just leaf page, not direct/indirect block) can be on | 
 | 187 | 		- radix-tree of shmem's inode. | 
 | 188 | 		- SwapCache. | 
 | 189 | 		- Both on radix-tree and SwapCache. This happens at swap-in | 
 | 190 | 		  and swap-out, | 
 | 191 |  | 
 | 192 | 	It's charged when... | 
 | 193 | 	- A new page is added to shmem's radix-tree. | 
 | 194 | 	- A swp page is read. (move a charge from swap_cgroup to page_cgroup) | 
 | 195 | 	It's uncharged when | 
 | 196 | 	- A page is removed from radix-tree and not SwapCache. | 
 | 197 | 	- When SwapCache is removed, a charge is moved to swap_cgroup. | 
 | 198 | 	- When swp_entry's refcnt goes down to 0, a charge in swap_cgroup | 
 | 199 | 	  disappears. | 
 | 200 |  | 
 | 201 | 7. Page Migration | 
 | 202 |    	One of the most complicated functions is page-migration-handler. | 
 | 203 | 	Memcg has 2 routines. Assume that we are migrating a page's contents | 
 | 204 | 	from OLDPAGE to NEWPAGE. | 
 | 205 |  | 
 | 206 | 	Usual migration logic is.. | 
 | 207 | 	(a) remove the page from LRU. | 
 | 208 | 	(b) allocate NEWPAGE (migration target) | 
 | 209 | 	(c) lock by lock_page(). | 
 | 210 | 	(d) unmap all mappings. | 
 | 211 | 	(e-1) If necessary, replace entry in radix-tree. | 
 | 212 | 	(e-2) move contents of a page. | 
 | 213 | 	(f) map all mappings again. | 
 | 214 | 	(g) pushback the page to LRU. | 
 | 215 | 	(-) OLDPAGE will be freed. | 
 | 216 |  | 
 | 217 | 	Before (g), memcg should complete all necessary charge/uncharge to | 
 | 218 | 	NEWPAGE/OLDPAGE. | 
 | 219 |  | 
 | 220 | 	The point is.... | 
 | 221 | 	- If OLDPAGE is anonymous, all charges will be dropped at (d) because | 
 | 222 |           try_to_unmap() drops all mapcount and the page will not be | 
 | 223 | 	  SwapCache. | 
 | 224 |  | 
 | 225 | 	- If OLDPAGE is SwapCache, charges will be kept at (g) because | 
 | 226 | 	  __delete_from_swap_cache() isn't called at (e-1) | 
 | 227 |  | 
 | 228 | 	- If OLDPAGE is page-cache, charges will be kept at (g) because | 
 | 229 | 	  remove_from_swap_cache() isn't called at (e-1) | 
 | 230 |  | 
 | 231 | 	memcg provides following hooks. | 
 | 232 |  | 
 | 233 | 	- mem_cgroup_prepare_migration(OLDPAGE) | 
 | 234 | 	  Called after (b) to account a charge (usage += PAGE_SIZE) against | 
 | 235 | 	  memcg which OLDPAGE belongs to. | 
 | 236 |  | 
 | 237 |         - mem_cgroup_end_migration(OLDPAGE, NEWPAGE) | 
 | 238 | 	  Called after (f) before (g). | 
 | 239 | 	  If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already | 
 | 240 | 	  charged, a charge by prepare_migration() is automatically canceled. | 
 | 241 | 	  If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE. | 
 | 242 |  | 
 | 243 | 	  But zap_pte() (by exit or munmap) can be called while migration, | 
 | 244 | 	  we have to check if OLDPAGE/NEWPAGE is a valid page after commit(). | 
 | 245 |  | 
 | 246 | 8. LRU | 
| Francis Galiegue | a33f322 | 2010-04-23 00:08:02 +0200 | [diff] [blame] | 247 |         Each memcg has its own private LRU. Now, its handling is under global | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 248 | 	VM's control (means that it's handled under global zone->lru_lock). | 
 | 249 | 	Almost all routines around memcg's LRU is called by global LRU's | 
 | 250 | 	list management functions under zone->lru_lock(). | 
 | 251 |  | 
 | 252 | 	A special function is mem_cgroup_isolate_pages(). This scans | 
 | 253 | 	memcg's private LRU and call __isolate_lru_page() to extract a page | 
 | 254 | 	from LRU. | 
 | 255 | 	(By __isolate_lru_page(), the page is removed from both of global and | 
 | 256 | 	 private LRU.) | 
 | 257 |  | 
 | 258 |  | 
 | 259 | 9. Typical Tests. | 
 | 260 |  | 
 | 261 |  Tests for racy cases. | 
 | 262 |  | 
 | 263 |  9.1 Small limit to memcg. | 
 | 264 | 	When you do test to do racy case, it's good test to set memcg's limit | 
 | 265 | 	to be very small rather than GB. Many races found in the test under | 
 | 266 | 	xKB or xxMB limits. | 
 | 267 | 	(Memory behavior under GB and Memory behavior under MB shows very | 
 | 268 | 	 different situation.) | 
 | 269 |  | 
 | 270 |  9.2 Shmem | 
 | 271 | 	Historically, memcg's shmem handling was poor and we saw some amount | 
 | 272 | 	of troubles here. This is because shmem is page-cache but can be | 
 | 273 | 	SwapCache. Test with shmem/tmpfs is always good test. | 
 | 274 |  | 
 | 275 |  9.3 Migration | 
 | 276 | 	For NUMA, migration is an another special case. To do easy test, cpuset | 
 | 277 | 	is useful. Following is a sample script to do migration. | 
 | 278 |  | 
 | 279 | 	mount -t cgroup -o cpuset none /opt/cpuset | 
 | 280 |  | 
 | 281 | 	mkdir /opt/cpuset/01 | 
 | 282 | 	echo 1 > /opt/cpuset/01/cpuset.cpus | 
 | 283 | 	echo 0 > /opt/cpuset/01/cpuset.mems | 
 | 284 | 	echo 1 > /opt/cpuset/01/cpuset.memory_migrate | 
 | 285 | 	mkdir /opt/cpuset/02 | 
 | 286 | 	echo 1 > /opt/cpuset/02/cpuset.cpus | 
 | 287 | 	echo 1 > /opt/cpuset/02/cpuset.mems | 
 | 288 | 	echo 1 > /opt/cpuset/02/cpuset.memory_migrate | 
 | 289 |  | 
 | 290 | 	In above set, when you moves a task from 01 to 02, page migration to | 
 | 291 | 	node 0 to node 1 will occur. Following is a script to migrate all | 
 | 292 | 	under cpuset. | 
 | 293 | 	-- | 
 | 294 | 	move_task() | 
 | 295 | 	{ | 
 | 296 | 	for pid in $1 | 
 | 297 |         do | 
 | 298 |                 /bin/echo $pid >$2/tasks 2>/dev/null | 
 | 299 | 		echo -n $pid | 
 | 300 | 		echo -n " " | 
 | 301 |         done | 
 | 302 | 	echo END | 
 | 303 | 	} | 
 | 304 |  | 
 | 305 | 	G1_TASK=`cat ${G1}/tasks` | 
 | 306 | 	G2_TASK=`cat ${G2}/tasks` | 
 | 307 | 	move_task "${G1_TASK}" ${G2} & | 
 | 308 | 	-- | 
 | 309 |  9.4 Memory hotplug. | 
 | 310 | 	memory hotplug test is one of good test. | 
 | 311 | 	to offline memory, do following. | 
 | 312 | 	# echo offline > /sys/devices/system/memory/memoryXXX/state | 
 | 313 | 	(XXX is the place of memory) | 
 | 314 | 	This is an easy way to test page migration, too. | 
 | 315 |  | 
 | 316 |  9.5 mkdir/rmdir | 
 | 317 | 	When using hierarchy, mkdir/rmdir test should be done. | 
 | 318 | 	Use tests like the following. | 
 | 319 |  | 
 | 320 | 	echo 1 >/opt/cgroup/01/memory/use_hierarchy | 
 | 321 | 	mkdir /opt/cgroup/01/child_a | 
 | 322 | 	mkdir /opt/cgroup/01/child_b | 
 | 323 |  | 
 | 324 | 	set limit to 01. | 
 | 325 | 	add limit to 01/child_b | 
 | 326 | 	run jobs under child_a and child_b | 
 | 327 |  | 
 | 328 | 	create/delete following groups at random while jobs are running. | 
 | 329 | 	/opt/cgroup/01/child_a/child_aa | 
 | 330 | 	/opt/cgroup/01/child_b/child_bb | 
 | 331 | 	/opt/cgroup/01/child_c | 
 | 332 |  | 
 | 333 | 	running new jobs in new group is also good. | 
 | 334 |  | 
 | 335 |  9.6 Mount with other subsystems. | 
 | 336 | 	Mounting with other subsystems is a good test because there is a | 
 | 337 | 	race and lock dependency with other cgroup subsystems. | 
 | 338 |  | 
 | 339 | 	example) | 
| Kirill A. Shutemov | 0263c12 | 2010-03-10 15:22:37 -0800 | [diff] [blame] | 340 | 	# mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices | 
| KAMEZAWA Hiroyuki | 9836d89 | 2009-01-07 18:08:27 -0800 | [diff] [blame] | 341 |  | 
 | 342 | 	and do task move, mkdir, rmdir etc...under this. | 
| KAMEZAWA Hiroyuki | 8d50d36 | 2009-01-29 14:25:14 -0800 | [diff] [blame] | 343 |  | 
 | 344 |  9.7 swapoff. | 
 | 345 | 	Besides management of swap is one of complicated parts of memcg, | 
 | 346 | 	call path of swap-in at swapoff is not same as usual swap-in path.. | 
 | 347 | 	It's worth to be tested explicitly. | 
 | 348 |  | 
 | 349 | 	For example, test like following is good. | 
 | 350 | 	(Shell-A) | 
| Kirill A. Shutemov | 0263c12 | 2010-03-10 15:22:37 -0800 | [diff] [blame] | 351 | 	# mount -t cgroup none /cgroup -o memory | 
| KAMEZAWA Hiroyuki | 8d50d36 | 2009-01-29 14:25:14 -0800 | [diff] [blame] | 352 | 	# mkdir /cgroup/test | 
 | 353 | 	# echo 40M > /cgroup/test/memory.limit_in_bytes | 
 | 354 | 	# echo 0 > /cgroup/test/tasks | 
 | 355 | 	Run malloc(100M) program under this. You'll see 60M of swaps. | 
 | 356 | 	(Shell-B) | 
 | 357 | 	# move all tasks in /cgroup/test to /cgroup | 
 | 358 | 	# /sbin/swapoff -a | 
| Thadeu Lima de Souza Cascardo | 6d5e147 | 2009-02-03 11:57:13 +0100 | [diff] [blame] | 359 | 	# rmdir /cgroup/test | 
| KAMEZAWA Hiroyuki | 8d50d36 | 2009-01-29 14:25:14 -0800 | [diff] [blame] | 360 | 	# kill malloc task. | 
 | 361 |  | 
 | 362 | 	Of course, tmpfs v.s. swapoff test should be tested, too. | 
| KAMEZAWA Hiroyuki | 0b7f569 | 2009-04-02 16:57:38 -0700 | [diff] [blame] | 363 |  | 
 | 364 |  9.8 OOM-Killer | 
 | 365 | 	Out-of-memory caused by memcg's limit will kill tasks under | 
 | 366 | 	the memcg. When hierarchy is used, a task under hierarchy | 
 | 367 | 	will be killed by the kernel. | 
 | 368 | 	In this case, panic_on_oom shouldn't be invoked and tasks | 
 | 369 | 	in other groups shouldn't be killed. | 
 | 370 |  | 
 | 371 | 	It's not difficult to cause OOM under memcg as following. | 
 | 372 | 	Case A) when you can swapoff | 
 | 373 | 	#swapoff -a | 
 | 374 | 	#echo 50M > /memory.limit_in_bytes | 
 | 375 | 	run 51M of malloc | 
 | 376 |  | 
 | 377 | 	Case B) when you use mem+swap limitation. | 
 | 378 | 	#echo 50M > memory.limit_in_bytes | 
 | 379 | 	#echo 50M > memory.memsw.limit_in_bytes | 
 | 380 | 	run 51M of malloc | 
| Daisuke Nishimura | 1080d7a | 2010-03-10 15:22:31 -0800 | [diff] [blame] | 381 |  | 
 | 382 |  9.9 Move charges at task migration | 
 | 383 | 	Charges associated with a task can be moved along with task migration. | 
 | 384 |  | 
 | 385 | 	(Shell-A) | 
 | 386 | 	#mkdir /cgroup/A | 
 | 387 | 	#echo $$ >/cgroup/A/tasks | 
 | 388 | 	run some programs which uses some amount of memory in /cgroup/A. | 
 | 389 |  | 
 | 390 | 	(Shell-B) | 
 | 391 | 	#mkdir /cgroup/B | 
 | 392 | 	#echo 1 >/cgroup/B/memory.move_charge_at_immigrate | 
 | 393 | 	#echo "pid of the program running in group A" >/cgroup/B/tasks | 
 | 394 |  | 
 | 395 | 	You can see charges have been moved by reading *.usage_in_bytes or | 
 | 396 | 	memory.stat of both A and B. | 
 | 397 | 	See 8.2 of Documentation/cgroups/memory.txt to see what value should be | 
 | 398 | 	written to move_charge_at_immigrate. | 
| Kirill A. Shutemov | 1e11145 | 2010-03-10 15:22:36 -0800 | [diff] [blame] | 399 |  | 
 | 400 |  9.10 Memory thresholds | 
 | 401 | 	Memory controler implements memory thresholds using cgroups notification | 
 | 402 | 	API. You can use Documentation/cgroups/cgroup_event_listener.c to test | 
 | 403 | 	it. | 
 | 404 |  | 
 | 405 | 	(Shell-A) Create cgroup and run event listener | 
 | 406 | 	# mkdir /cgroup/A | 
 | 407 | 	# ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M | 
 | 408 |  | 
 | 409 | 	(Shell-B) Add task to cgroup and try to allocate and free memory | 
 | 410 | 	# echo $$ >/cgroup/A/tasks | 
 | 411 | 	# a="$(dd if=/dev/zero bs=1M count=10)" | 
 | 412 | 	# a= | 
 | 413 |  | 
 | 414 | 	You will see message from cgroup_event_listener every time you cross | 
 | 415 | 	the thresholds. | 
 | 416 |  | 
 | 417 | 	Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds. | 
 | 418 |  | 
 | 419 | 	It's good idea to test root cgroup as well. |