thp, memcg: split hugepage for memcg oom on cow On COW, a new hugepage is allocated and charged to the memcg. If the system is oom or the charge to the memcg fails, however, the fault handler will return VM_FAULT_OOM which results in an oom kill. Instead, it's possible to fallback to splitting the hugepage so that the COW results only in an order-0 page being allocated and charged to the memcg which has a higher liklihood to succeed. This is expensive because the hugepage must be split in the page fault handler, but it is much better than unnecessarily oom killing a process. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit: 1f1d06c34f7675026326cd9f39ff91e4555cf355 [log] [tgz]
author: David Rientjes <rientjes@google.com> Tue May 29 15:06:23 2012 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> Tue May 29 16:22:19 2012 -0700
tree: b2493685179e3b222c915002648c3baba56318d2
parent: bde8bd8a1d5242589ddcaef8e017b48b207c4729 [diff] [blame]
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d7d7165..edfeb8c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c

@@ -952,6 +952,8 @@
 		count_vm_event(THP_FAULT_FALLBACK);
 		ret = do_huge_pmd_wp_page_fallback(mm, vma, address,
 						   pmd, orig_pmd, page, haddr);
+		if (ret & VM_FAULT_OOM)
+			split_huge_page(page);
 		put_page(page);
 		goto out;
 	}
@@ -959,6 +961,7 @@
 
 	if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
 		put_page(new_page);
+		split_huge_page(page);
 		put_page(page);
 		ret |= VM_FAULT_OOM;
 		goto out;
commit	1f1d06c34f7675026326cd9f39ff91e4555cf355	[log] [tgz]
author	David Rientjes <rientjes@google.com>	Tue May 29 15:06:23 2012 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	Tue May 29 16:22:19 2012 -0700
tree	b2493685179e3b222c915002648c3baba56318d2
parent	bde8bd8a1d5242589ddcaef8e017b48b207c4729 [diff] [blame]