)]}'
{
  "log": [
    {
      "commit": "1f1d06c34f7675026326cd9f39ff91e4555cf355",
      "tree": "b2493685179e3b222c915002648c3baba56318d2",
      "parents": [
        "bde8bd8a1d5242589ddcaef8e017b48b207c4729"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Tue May 29 15:06:23 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue May 29 16:22:19 2012 -0700"
      },
      "message": "thp, memcg: split hugepage for memcg oom on cow\n\nOn COW, a new hugepage is allocated and charged to the memcg.  If the\nsystem is oom or the charge to the memcg fails, however, the fault\nhandler will return VM_FAULT_OOM which results in an oom kill.\n\nInstead, it\u0027s possible to fallback to splitting the hugepage so that the\nCOW results only in an order-0 page being allocated and charged to the\nmemcg which has a higher liklihood to succeed.  This is expensive\nbecause the hugepage must be split in the page fault handler, but it is\nmuch better than unnecessarily oom killing a process.\n\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e709ffd6169ccd259eb5874e853303e91e94e829",
      "tree": "796b56c2507b8581492da73e354d651c9dd7076b",
      "parents": [
        "edad9d2c337d43278a9d5aeb0ed531c2e838f8a6"
      ],
      "author": {
        "name": "Rik van Riel",
        "email": "riel@redhat.com",
        "time": "Tue May 29 15:06:18 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue May 29 16:22:19 2012 -0700"
      },
      "message": "mm: remove swap token code\n\nThe swap token code no longer fits in with the current VM model.  It\ndoes not play well with cgroups or the better NUMA placement code in\ndevelopment, since we have only one swap token globally.\n\nIt also has the potential to mess with scalability of the system, by\nincreasing the number of non-reclaimable pages on the active and\ninactive anon LRU lists.\n\nLast but not least, the swap token code has been broken for a year\nwithout complaints, as reported by Konstantin Khlebnikov.  This suggests\nwe no longer have much use for it.\n\nThe days of sub-1G memory systems with heavy use of swap are over.  If\nwe ever need thrashing reducing code in the future, we will have to\nimplement something that does scale.\n\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Konstantin Khlebnikov \u003ckhlebnikov@openvz.org\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nAcked-by: Bob Picco \u003cbpicco@meloft.net\u003e\nAcked-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "654443e20dfc0617231f28a07c96a979ee1a0239",
      "tree": "a0dc3f093eb13892539082e663607c34b4fc2d07",
      "parents": [
        "2c01e7bc46f10e9190818437e564f7e0db875ae9",
        "9cba26e66d09bf394ae5a739627a1dc8b7cae6f4"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 24 11:39:34 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 24 11:39:34 2012 -0700"
      },
      "message": "Merge branch \u0027perf-uprobes-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip\n\nPull user-space probe instrumentation from Ingo Molnar:\n \"The uprobes code originates from SystemTap and has been used for years\n  in Fedora and RHEL kernels.  This version is much rewritten, reviews\n  from PeterZ, Oleg and myself shaped the end result.\n\n  This tree includes uprobes support in \u0027perf probe\u0027 - but SystemTap\n  (and other tools) can take advantage of user probe points as well.\n\n  Sample usage of uprobes via perf, for example to profile malloc()\n  calls without modifying user-space binaries.\n\n  First boot a new kernel with CONFIG_UPROBE_EVENT\u003dy enabled.\n\n  If you don\u0027t know which function you want to probe you can pick one\n  from \u0027perf top\u0027 or can get a list all functions that can be probed\n  within libc (binaries can be specified as well):\n\n\t$ perf probe -F -x /lib/libc.so.6\n\n  To probe libc\u0027s malloc():\n\n\t$ perf probe -x /lib64/libc.so.6 malloc\n\tAdded new event:\n\tprobe_libc:malloc    (on 0x7eac0)\n\n  You can now use it in all perf tools, such as:\n\n\tperf record -e probe_libc:malloc -aR sleep 1\n\n  Make use of it to create a call graph (as the flat profile is going to\n  look very boring):\n\n\t$ perf record -e probe_libc:malloc -gR make\n\t[ perf record: Woken up 173 times to write data ]\n\t[ perf record: Captured and wrote 44.190 MB perf.data (~1930712\n\n\t$ perf report | less\n\n\t  32.03%            git  libc-2.15.so   [.] malloc\n\t                    |\n\t                    --- malloc\n\n\t  29.49%            cc1  libc-2.15.so   [.] malloc\n\t                    |\n\t                    --- malloc\n\t                       |\n\t                       |--0.95%-- 0x208eb1000000000\n\t                       |\n\t                       |--0.63%-- htab_traverse_noresize\n\n\t  11.04%             as  libc-2.15.so   [.] malloc\n\t                     |\n\t                     --- malloc\n\t                        |\n\n\t   7.15%             ld  libc-2.15.so   [.] malloc\n\t                     |\n\t                     --- malloc\n\t                        |\n\n\t   5.07%             sh  libc-2.15.so   [.] malloc\n\t                     |\n\t                     --- malloc\n\t                        |\n\t   4.99%  python-config  libc-2.15.so   [.] malloc\n\t          |\n\t          --- malloc\n\t             |\n\t   4.54%           make  libc-2.15.so   [.] malloc\n\t                   |\n\t                   --- malloc\n\t                      |\n\t                      |--7.34%-- glob\n\t                      |          |\n\t                      |          |--93.18%-- 0x41588f\n\t                      |          |\n\t                      |           --6.82%-- glob\n\t                      |                     0x41588f\n\n\t   ...\n\n  Or:\n\n\t$ perf report -g flat | less\n\n\t# Overhead        Command  Shared Object      Symbol\n\t# ........  .............  .............  ..........\n\t#\n\t  32.03%            git  libc-2.15.so   [.] malloc\n\t          27.19%\n\t              malloc\n\n\t  29.49%            cc1  libc-2.15.so   [.] malloc\n\t          24.77%\n\t              malloc\n\n\t  11.04%             as  libc-2.15.so   [.] malloc\n\t          11.02%\n\t              malloc\n\n\t   7.15%             ld  libc-2.15.so   [.] malloc\n\t           6.57%\n\t              malloc\n\n\t ...\n\n  The core uprobes design is fairly straightforward: uprobes probe\n  points register themselves at (inode:offset) addresses of\n  libraries/binaries, after which all existing (or new) vmas that map\n  that address will have a software breakpoint injected at that address.\n  vmas are COW-ed to preserve original content.  The probe points are\n  kept in an rbtree.\n\n  If user-space executes the probed inode:offset instruction address\n  then an event is generated which can be recovered from the regular\n  perf event channels and mmap-ed ring-buffer.\n\n  Multiple probes at the same address are supported, they create a\n  dynamic callback list of event consumers.\n\n  The basic model is further complicated by the XOL speedup: the\n  original instruction that is probed is copied (in an architecture\n  specific fashion) and executed out of line when the probe triggers.\n  The XOL area is a single vma per process, with a fixed number of\n  entries (which limits probe execution parallelism).\n\n  The API: uprobes are installed/removed via\n  /sys/kernel/debug/tracing/uprobe_events, the API is integrated to\n  align with the kprobes interface as much as possible, but is separate\n  to it.\n\n  Injecting a probe point is privileged operation, which can be relaxed\n  by setting perf_paranoid to -1.\n\n  You can use multiple probes as well and mix them with kprobes and\n  regular PMU events or tracepoints, when instrumenting a task.\"\n\nFix up trivial conflicts in mm/memory.c due to previous cleanup of\nunmap_single_vma().\n\n* \u0027perf-uprobes-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)\n  perf probe: Detect probe target when m/x options are absent\n  perf probe: Provide perf interface for uprobes\n  tracing: Fix kconfig warning due to a typo\n  tracing: Provide trace events interface for uprobes\n  tracing: Extract out common code for kprobes/uprobes trace events\n  tracing: Modify is_delete, is_return from int to bool\n  uprobes/core: Decrement uprobe count before the pages are unmapped\n  uprobes/core: Make background page replacement logic account for rss_stat counters\n  uprobes/core: Optimize probe hits with the help of a counter\n  uprobes/core: Allocate XOL slots for uprobes use\n  uprobes/core: Handle breakpoint and singlestep exceptions\n  uprobes/core: Rename bkpt to swbp\n  uprobes/core: Make order of function parameters consistent across functions\n  uprobes/core: Make macro names consistent\n  uprobes: Update copyright notices\n  uprobes/core: Move insn to arch specific structure\n  uprobes/core: Remove uprobe_opcode_sz\n  uprobes/core: Make instruction tables volatile\n  uprobes: Move to kernel/events/\n  uprobes/core: Clean up, refactor and improve the code\n  ...\n"
    },
    {
      "commit": "4f74d2c8e827af12596f153a564c868bf6dbe3dd",
      "tree": "6ef2bafd6c23a4c4a9ef716ea530daea824a7721",
      "parents": [
        "7e027b14d53e9729f823ba8652095d1e309aa8e9"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun May 06 13:54:06 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun May 06 14:05:17 2012 -0700"
      },
      "message": "vm: remove \u0027nr_accounted\u0027 calculations from the unmap_vmas() interfaces\n\nThe VM accounting makes no sense at this level, and half of the callers\ndidn\u0027t ever actually use the end result.  The only time we want to\nunaccount the memory is when we actually remove the vma, so do the\naccounting at that point instead.\n\nThis simplifies the interfaces (no need to pass down that silly page\ncounter to functions that really don\u0027t care), and also makes it much\nmore obvious what is actually going on: we do vm_[un]acct_memory() when\nadding or removing the vma, not on random page walking.\n\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "7e027b14d53e9729f823ba8652095d1e309aa8e9",
      "tree": "a706e9f6ac67d92e4df18662cdb0205844a17871",
      "parents": [
        "18b15fcde715a5512671af9d72a76e7f6d7cb6f0"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun May 06 13:43:15 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun May 06 13:52:07 2012 -0700"
      },
      "message": "vm: simplify unmap_vmas() calling convention\n\nNone of the callers want to pass in \u0027zap_details\u0027, and it doesn\u0027t even\nmake sense for the case of actually unmapping vma\u0027s.  So remove the\nargument, and clean up the interface.\n\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "cbc91f71b51b8335f1fc7ccfca8011f31a717367",
      "tree": "31bc32a4ee512c9056c93e8c46d58bc217d31bc2",
      "parents": [
        "7396fa818d6278694a44840f389ddc40a3269a9a"
      ],
      "author": {
        "name": "Srikar Dronamraju",
        "email": "srikar@linux.vnet.ibm.com",
        "time": "Wed Apr 11 16:05:27 2012 +0530"
      },
      "committer": {
        "name": "Ingo Molnar",
        "email": "mingo@kernel.org",
        "time": "Sat Apr 14 13:25:48 2012 +0200"
      },
      "message": "uprobes/core: Decrement uprobe count before the pages are unmapped\n\nUprobes has a callback (uprobe_munmap()) in the unmap path to\nmaintain the uprobes count.\n\nIn the exit path this callback gets called in unlink_file_vma().\nHowever by the time unlink_file_vma() is called, the pages would\nhave been unmapped (in unmap_vmas()) and the task-\u003erss_stat counts\naccounted (in zap_pte_range()).\n\nIf the exiting process has probepoints, uprobe_munmap() checks if\nthe breakpoint instruction was around before decrementing the probe\ncount.\n\nThis results in a file backed page being reread by uprobe_munmap()\nand hence it does not find the breakpoint.\n\nThis patch fixes this problem by moving the callback to\nunmap_single_vma(). Since unmap_single_vma() may not unmap the\ncomplete vma, add start and end parameters to uprobe_munmap().\n\nThis bug became apparent courtesy of commit c3f0327f8e9d\n(\"mm: add rss counters consistency check\").\n\nSigned-off-by: Srikar Dronamraju \u003csrikar@linux.vnet.ibm.com\u003e\nCc: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\nCc: Ananth N Mavinakayanahalli \u003cananth@in.ibm.com\u003e\nCc: Jim Keniston \u003cjkenisto@linux.vnet.ibm.com\u003e\nCc: Linux-mm \u003clinux-mm@kvack.org\u003e\nCc: Oleg Nesterov \u003coleg@redhat.com\u003e\nCc: Andi Kleen \u003candi@firstfloor.org\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nCc: Steven Rostedt \u003crostedt@goodmis.org\u003e\nCc: Arnaldo Carvalho de Melo \u003cacme@infradead.org\u003e\nCc: Masami Hiramatsu \u003cmasami.hiramatsu.pt@hitachi.com\u003e\nCc: Anton Arapov \u003canton@redhat.com\u003e\nCc: Peter Zijlstra \u003cpeterz@infradead.org\u003e\nLink: http://lkml.kernel.org/r/20120411103527.23245.9835.sendpatchset@srdronam.in.ibm.com\nSigned-off-by: Ingo Molnar \u003cmingo@kernel.org\u003e\n"
    },
    {
      "commit": "909af768e88867016f427264ae39d27a57b6a8ed",
      "tree": "5068b4d98e4bedecde89d9113dc7ef8c69633f45",
      "parents": [
        "1cc684ab75123efe7ff446eb821d44375ba8fa30"
      ],
      "author": {
        "name": "Jason Baron",
        "email": "jbaron@redhat.com",
        "time": "Fri Mar 23 15:02:51 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Mar 23 16:58:42 2012 -0700"
      },
      "message": "coredump: remove VM_ALWAYSDUMP flag\n\nThe motivation for this patchset was that I was looking at a way for a\nqemu-kvm process, to exclude the guest memory from its core dump, which\ncan be quite large.  There are already a number of filter flags in\n/proc/\u003cpid\u003e/coredump_filter, however, these allow one to specify \u0027types\u0027\nof kernel memory, not specific address ranges (which is needed in this\ncase).\n\nSince there are no more vma flags available, the first patch eliminates\nthe need for the \u0027VM_ALWAYSDUMP\u0027 flag.  The flag is used internally by\nthe kernel to mark vdso and vsyscall pages.  However, it is simple\nenough to check if a vma covers a vdso or vsyscall page without the need\nfor this flag.\n\nThe second patch then replaces the \u0027VM_ALWAYSDUMP\u0027 flag with a new\n\u0027VM_NODUMP\u0027 flag, which can be set by userspace using new madvise flags:\n\u0027MADV_DONTDUMP\u0027, and unset via \u0027MADV_DODUMP\u0027.  The core dump filters\ncontinue to work the same as before unless \u0027MADV_DONTDUMP\u0027 is set on the\nregion.\n\nThe qemu code which implements this features is at:\n\n  http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch\n\nIn my testing the qemu core dump shrunk from 383MB -\u003e 13MB with this\npatch.\n\nI also believe that the \u0027MADV_DONTDUMP\u0027 flag might be useful for\nsecurity sensitive apps, which might want to select which areas are\ndumped.\n\nThis patch:\n\nThe VM_ALWAYSDUMP flag is currently used by the coredump code to\nindicate that a vma is part of a vsyscall or vdso section.  However, we\ncan determine if a vma is in one these sections by checking it against\nthe gate_vma and checking for a non-NULL return value from\narch_vma_name().  Thus, freeing a valuable vma bit.\n\nSigned-off-by: Jason Baron \u003cjbaron@redhat.com\u003e\nAcked-by: Roland McGrath \u003croland@hack.frob.com\u003e\nCc: Chris Metcalf \u003ccmetcalf@tilera.com\u003e\nCc: Avi Kivity \u003cavi@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "95211279c5ad00a317c98221d7e4365e02f20836",
      "tree": "2ddc8625378d2915b8c96392f3cf6663b705ed55",
      "parents": [
        "5375871d432ae9fc581014ac117b96aaee3cd0c7",
        "12724850e8064f64b6223d26d78c0597c742c65a"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Mar 22 09:04:48 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Mar 22 09:04:48 2012 -0700"
      },
      "message": "Merge branch \u0027akpm\u0027 (Andrew\u0027s patch-bomb)\n\nMerge first batch of patches from Andrew Morton:\n \"A few misc things and all the MM queue\"\n\n* emailed from Andrew Morton \u003cakpm@linux-foundation.org\u003e: (92 commits)\n  memcg: avoid THP split in task migration\n  thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE\n  memcg: clean up existing move charge code\n  mm/memcontrol.c: remove unnecessary \u0027break\u0027 in mem_cgroup_read()\n  mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()\n  mm/memcontrol.c: s/stealed/stolen/\n  memcg: fix performance of mem_cgroup_begin_update_page_stat()\n  memcg: remove PCG_FILE_MAPPED\n  memcg: use new logic for page stat accounting\n  memcg: remove PCG_MOVE_LOCK flag from page_cgroup\n  memcg: simplify move_account() check\n  memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)\n  memcg: kill dead prev_priority stubs\n  memcg: remove PCG_CACHE page_cgroup flag\n  memcg: let css_get_next() rely upon rcu_read_lock()\n  cgroup: revert ss_id_lock to spinlock\n  idr: make idr_get_next() good for rcu_read_lock()\n  memcg: remove unnecessary thp check in page stat accounting\n  memcg: remove redundant returns\n  memcg: enum lru_list lru\n  ...\n"
    },
    {
      "commit": "ea48cf7863c789579b170ef28e7fc62728365d6e",
      "tree": "3602d07d69e1b1b6d8a26f2b221705bb2862ee3c",
      "parents": [
        "05af2e104a0c282dcd9303431e1360750ba76de6"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Wed Mar 21 16:34:13 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 21 17:54:59 2012 -0700"
      },
      "message": "mm, counters: fold __sync_task_rss_stat() into sync_mm_rss()\n\nThere\u0027s no difference between sync_mm_rss() and __sync_task_rss_stat(),\nso fold the latter into the former.\n\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "05af2e104a0c282dcd9303431e1360750ba76de6",
      "tree": "cdd5876f2d17b26cc3ded7ef85d04d0e853e9b7e",
      "parents": [
        "90481622d75715bfcb68501280a917dbfe516029"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Wed Mar 21 16:34:13 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 21 17:54:59 2012 -0700"
      },
      "message": "mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat()\n\nsync_mm_rss() can only be used for current to avoid race conditions in\niterating and clearing its per-task counters.  Remove the task argument\nfor it and its helper function, __sync_task_rss_stat(), to avoid thinking\nit can be used safely for anything other than current.\n\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "69c978232aaa99476f9bd002c2a29a84fa3779b5",
      "tree": "7edb0da034b8824040f4f7327dd31ad260532167",
      "parents": [
        "6131728914810a6c02e08750e13e45870101e862"
      ],
      "author": {
        "name": "Konstantin Khlebnikov",
        "email": "khlebnikov@openvz.org",
        "time": "Wed Mar 21 16:33:49 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 21 17:54:55 2012 -0700"
      },
      "message": "mm: make get_mm_counter static-inline\n\nMake get_mm_counter() always static inline, it is simple enough for that.\nAnd remove unused set_mm_counter()\n\nbloat-o-meter:\n\nadd/remove: 0/1 grow/shrink: 4/12 up/down: 99/-341 (-242)\nfunction                                     old     new   delta\ntry_to_unmap_one                             886     952     +66\nsys_remap_file_pages                        1214    1230     +16\ndup_mm                                      1684    1700     +16\ndo_exit                                     2277    2278      +1\nzap_page_range                               208     205      -3\nunmap_region                                 304     296      -8\nstatic.oom_kill_process                      554     546      -8\ntry_to_unmap_file                           1716    1700     -16\ngetrusage                                    925     909     -16\nflush_old_exec                              1704    1688     -16\nstatic.dump_header                           416     390     -26\nacct_update_integrals                        218     187     -31\ndo_task_stat                                2986    2954     -32\nget_mm_counter                                34       -     -34\nxacct_add_tsk                                371     334     -37\ntask_statm                                   172     118     -54\ntask_mem                                     383     323     -60\n\ntry_to_unmap_one() grows because update_hiwater_rss() now completely inline.\n\nSigned-off-by: Konstantin Khlebnikov \u003ckhlebnikov@openvz.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Kirill A. Shutemov \u003ckirill@shutemov.name\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1a5a9906d4e8d1976b701f889d8f35d54b928f25",
      "tree": "e51912e725f224663a738045a4d0528d08da4572",
      "parents": [
        "31f6765266417c0d99f0e922fe82848a7c9c2ae9"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Wed Mar 21 16:33:42 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 21 17:54:54 2012 -0700"
      },
      "message": "mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode\n\nIn some cases it may happen that pmd_none_or_clear_bad() is called with\nthe mmap_sem hold in read mode.  In those cases the huge page faults can\nallocate hugepmds under pmd_none_or_clear_bad() and that can trigger a\nfalse positive from pmd_bad() that will not like to see a pmd\nmaterializing as trans huge.\n\nIt\u0027s not khugepaged causing the problem, khugepaged holds the mmap_sem\nin write mode (and all those sites must hold the mmap_sem in read mode\nto prevent pagetables to go away from under them, during code review it\nseems vm86 mode on 32bit kernels requires that too unless it\u0027s\nrestricted to 1 thread per process or UP builds).  The race is only with\nthe huge pagefaults that can convert a pmd_none() into a\npmd_trans_huge().\n\nEffectively all these pmd_none_or_clear_bad() sites running with\nmmap_sem in read mode are somewhat speculative with the page faults, and\nthe result is always undefined when they run simultaneously.  This is\nprobably why it wasn\u0027t common to run into this.  For example if the\nmadvise(MADV_DONTNEED) runs zap_page_range() shortly before the page\nfault, the hugepage will not be zapped, if the page fault runs first it\nwill be zapped.\n\nAltering pmd_bad() not to error out if it finds hugepmds won\u0027t be enough\nto fix this, because zap_pmd_range would then proceed to call\nzap_pte_range (which would be incorrect if the pmd become a\npmd_trans_huge()).\n\nThe simplest way to fix this is to read the pmd in the local stack\n(regardless of what we read, no need of actual CPU barriers, only\ncompiler barrier needed), and be sure it is not changing under the code\nthat computes its value.  Even if the real pmd is changing under the\nvalue we hold on the stack, we don\u0027t care.  If we actually end up in\nzap_pte_range it means the pmd was not none already and it was not huge,\nand it can\u0027t become huge from under us (khugepaged locking explained\nabove).\n\nAll we need is to enforce that there is no way anymore that in a code\npath like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad\ncan run into a hugepmd.  The overhead of a barrier() is just a compiler\ntweak and should not be measurable (I only added it for THP builds).  I\ndon\u0027t exclude different compiler versions may have prevented the race\ntoo by caching the value of *pmd on the stack (that hasn\u0027t been\nverified, but it wouldn\u0027t be impossible considering\npmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines\nand there\u0027s no external function called in between pmd_trans_huge and\npmd_none_or_clear_bad).\n\n\t\tif (pmd_trans_huge(*pmd)) {\n\t\t\tif (next-addr !\u003d HPAGE_PMD_SIZE) {\n\t\t\t\tVM_BUG_ON(!rwsem_is_locked(\u0026tlb-\u003emm-\u003emmap_sem));\n\t\t\t\tsplit_huge_page_pmd(vma-\u003evm_mm, pmd);\n\t\t\t} else if (zap_huge_pmd(tlb, vma, pmd, addr))\n\t\t\t\tcontinue;\n\t\t\t/* fall through */\n\t\t}\n\t\tif (pmd_none_or_clear_bad(pmd))\n\nBecause this race condition could be exercised without special\nprivileges this was reported in CVE-2012-1179.\n\nThe race was identified and fully explained by Ulrich who debugged it.\nI\u0027m quoting his accurate explanation below, for reference.\n\n\u003d\u003d\u003d\u003d\u003d\u003d start quote \u003d\u003d\u003d\u003d\u003d\u003d\u003d\n      mapcount 0 page_mapcount 1\n      kernel BUG at mm/huge_memory.c:1384!\n\n    At some point prior to the panic, a \"bad pmd ...\" message similar to the\n    following is logged on the console:\n\n      mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).\n\n    The \"bad pmd ...\" message is logged by pmd_clear_bad() before it clears\n    the page\u0027s PMD table entry.\n\n        143 void pmd_clear_bad(pmd_t *pmd)\n        144 {\n    -\u003e  145         pmd_ERROR(*pmd);\n        146         pmd_clear(pmd);\n        147 }\n\n    After the PMD table entry has been cleared, there is an inconsistency\n    between the actual number of PMD table entries that are mapping the page\n    and the page\u0027s map count (_mapcount field in struct page). When the page\n    is subsequently reclaimed, __split_huge_page() detects this inconsistency.\n\n       1381         if (mapcount !\u003d page_mapcount(page))\n       1382                 printk(KERN_ERR \"mapcount %d page_mapcount %d\\n\",\n       1383                        mapcount, page_mapcount(page));\n    -\u003e 1384         BUG_ON(mapcount !\u003d page_mapcount(page));\n\n    The root cause of the problem is a race of two threads in a multithreaded\n    process. Thread B incurs a page fault on a virtual address that has never\n    been accessed (PMD entry is zero) while Thread A is executing an madvise()\n    system call on a virtual address within the same 2 MB (huge page) range.\n\n               virtual address space\n              .---------------------.\n              |                     |\n              |                     |\n            .-|---------------------|\n            | |                     |\n            | |                     |\u003c-- B(fault)\n            | |                     |\n      2 MB  | |/////////////////////|-.\n      huge \u003c  |/////////////////////|  \u003e A(range)\n      page  | |/////////////////////|-\u0027\n            | |                     |\n            | |                     |\n            \u0027-|---------------------|\n              |                     |\n              |                     |\n              \u0027---------------------\u0027\n\n    - Thread A is executing an madvise(..., MADV_DONTNEED) system call\n      on the virtual address range \"A(range)\" shown in the picture.\n\n    sys_madvise\n      // Acquire the semaphore in shared mode.\n      down_read(\u0026current-\u003emm-\u003emmap_sem)\n      ...\n      madvise_vma\n        switch (behavior)\n        case MADV_DONTNEED:\n             madvise_dontneed\n               zap_page_range\n                 unmap_vmas\n                   unmap_page_range\n                     zap_pud_range\n                       zap_pmd_range\n                         //\n                         // Assume that this huge page has never been accessed.\n                         // I.e. content of the PMD entry is zero (not mapped).\n                         //\n                         if (pmd_trans_huge(*pmd)) {\n                             // We don\u0027t get here due to the above assumption.\n                         }\n                         //\n                         // Assume that Thread B incurred a page fault and\n             .---------\u003e // sneaks in here as shown below.\n             |           //\n             |           if (pmd_none_or_clear_bad(pmd))\n             |               {\n             |                 if (unlikely(pmd_bad(*pmd)))\n             |                     pmd_clear_bad\n             |                     {\n             |                       pmd_ERROR\n             |                         // Log \"bad pmd ...\" message here.\n             |                       pmd_clear\n             |                         // Clear the page\u0027s PMD entry.\n             |                         // Thread B incremented the map count\n             |                         // in page_add_new_anon_rmap(), but\n             |                         // now the page is no longer mapped\n             |                         // by a PMD entry (-\u003e inconsistency).\n             |                     }\n             |               }\n             |\n             v\n    - Thread B is handling a page fault on virtual address \"B(fault)\" shown\n      in the picture.\n\n    ...\n    do_page_fault\n      __do_page_fault\n        // Acquire the semaphore in shared mode.\n        down_read_trylock(\u0026mm-\u003emmap_sem)\n        ...\n        handle_mm_fault\n          if (pmd_none(*pmd) \u0026\u0026 transparent_hugepage_enabled(vma))\n              // We get here due to the above assumption (PMD entry is zero).\n              do_huge_pmd_anonymous_page\n                alloc_hugepage_vma\n                  // Allocate a new transparent huge page here.\n                ...\n                __do_huge_pmd_anonymous_page\n                  ...\n                  spin_lock(\u0026mm-\u003epage_table_lock)\n                  ...\n                  page_add_new_anon_rmap\n                    // Here we increment the page\u0027s map count (starts at -1).\n                    atomic_set(\u0026page-\u003e_mapcount, 0)\n                  set_pmd_at\n                    // Here we set the page\u0027s PMD entry which will be cleared\n                    // when Thread A calls pmd_clear_bad().\n                  ...\n                  spin_unlock(\u0026mm-\u003epage_table_lock)\n\n    The mmap_sem does not prevent the race because both threads are acquiring\n    it in shared mode (down_read).  Thread B holds the page_table_lock while\n    the page\u0027s map count and PMD table entry are updated.  However, Thread A\n    does not synchronize on that lock.\n\n\u003d\u003d\u003d\u003d\u003d\u003d end quote \u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\n[akpm@linux-foundation.org: checkpatch fixes]\nReported-by: Ulrich Obergfell \u003cuobergfe@redhat.com\u003e\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Dave Jones \u003cdavej@redhat.com\u003e\nAcked-by: Larry Woodman \u003clwoodman@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: \u003cstable@vger.kernel.org\u003e\t\t[2.6.38+]\nCc: Mark Salter \u003cmsalter@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3a990a52f9f25f45469e272017a31e7a3fda60ed",
      "tree": "366f639d9ce1e907b65caa72bc098df6c4b5a240",
      "parents": [
        "3556485f1595e3964ba539e39ea682acbb835cee",
        "f5cc4eef9987d0b517364d01e290d6438e47ee5d"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 21 13:32:19 2012 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 21 13:32:19 2012 -0700"
      },
      "message": "Merge branch \u0027vm\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs\n\nPull munmap/truncate race fixes from Al Viro:\n \"Fixes for racy use of unmap_vmas() on truncate-related codepaths\"\n\n* \u0027vm\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:\n  VM: make zap_page_range() callers that act on a single VMA use separate helper\n  VM: make unmap_vmas() return void\n  VM: don\u0027t bother with feeding upper limit to tlb_finish_mmu() in exit_mmap()\n  VM: make zap_page_range() return void\n  VM: can\u0027t go through the inner loop in unmap_vmas() more than once...\n  VM: unmap_page_range() can return void\n"
    },
    {
      "commit": "f5cc4eef9987d0b517364d01e290d6438e47ee5d",
      "tree": "1c6a5ec2abf40450b89134564c35c0beafded436",
      "parents": [
        "6e8bb0193af3f308ef22817a5560422d33e58b90"
      ],
      "author": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Mon Mar 05 14:14:20 2012 -0500"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Tue Mar 20 21:39:51 2012 -0400"
      },
      "message": "VM: make zap_page_range() callers that act on a single VMA use separate helper\n\n... and not rely on -\u003evm_next being there for them...\n\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "6e8bb0193af3f308ef22817a5560422d33e58b90",
      "tree": "6001421c8d389bd00b18e0510e3f6c9130f9f80b",
      "parents": [
        "853f5e264018113b1f96f05551b07a74b836c7fc"
      ],
      "author": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Mon Mar 05 13:41:15 2012 -0500"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Tue Mar 20 21:39:51 2012 -0400"
      },
      "message": "VM: make unmap_vmas() return void\n\nsame story - nobody uses it and it\u0027s been pointless since\n\"mm: Remove i_mmap_lock lockbreak\" went in.\n\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "14f5ff5df37a8fabe2d25b1e64df7e010cc87db9",
      "tree": "10f46ad8429790de35ebad33631d435f74aaff0e",
      "parents": [
        "8b2a12382ccc9df31b27dac37fe04dffe088b57c"
      ],
      "author": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Mon Mar 05 13:38:09 2012 -0500"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Tue Mar 20 21:39:50 2012 -0400"
      },
      "message": "VM: make zap_page_range() return void\n\n... since all callers ignore its return value and it\u0027s been\nuseless since commit 97a894136f29802da19a15541de3c019e1ca147e\n(mm: Remove i_mmap_lock lockbreak) anyway.\n\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "8b2a12382ccc9df31b27dac37fe04dffe088b57c",
      "tree": "77e79b540a288b3c2bce78bfd7aff4c58511ecd1",
      "parents": [
        "038c7aa16a38059ac23dfe9caa6954226ea20728"
      ],
      "author": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Mon Mar 05 13:35:49 2012 -0500"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Tue Mar 20 21:39:50 2012 -0400"
      },
      "message": "VM: can\u0027t go through the inner loop in unmap_vmas() more than once...\n\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "038c7aa16a38059ac23dfe9caa6954226ea20728",
      "tree": "b851af73694ff7e0cd69ce90c7506c82122215c1",
      "parents": [
        "c16fa4f2ad19908a47c63d8fa436a1178438c7e7"
      ],
      "author": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Mon Mar 05 13:25:09 2012 -0500"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Tue Mar 20 21:39:50 2012 -0400"
      },
      "message": "VM: unmap_page_range() can return void\n\nreturn value is always the 4th (\u0027end\u0027) argument.\n\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "9b04c5fec43c0da610a2c37f70c5b013101a6ad7",
      "tree": "f04767281b7067fba91cf0d37440bf454c492e38",
      "parents": [
        "c3eede8e0a1292d95c051cf947738687b9c42322"
      ],
      "author": {
        "name": "Cong Wang",
        "email": "amwang@redhat.com",
        "time": "Fri Nov 25 23:14:39 2011 +0800"
      },
      "committer": {
        "name": "Cong Wang",
        "email": "xiyou.wangcong@gmail.com",
        "time": "Tue Mar 20 21:48:27 2012 +0800"
      },
      "message": "mm: remove the second argument of k[un]map_atomic()\n\nSigned-off-by: Cong Wang \u003camwang@redhat.com\u003e\n"
    },
    {
      "commit": "9f9f1acd713d69fae2af286fbeedc6c8963411c6",
      "tree": "cfa1485d01cb36c720f1e2b96b21748acc10ec06",
      "parents": [
        "245132643e1cfcd145bbc86a716c1818371fcb93"
      ],
      "author": {
        "name": "Konstantin Khlebnikov",
        "email": "khlebnikov@openvz.org",
        "time": "Fri Jan 20 14:34:24 2012 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jan 23 08:38:49 2012 -0800"
      },
      "message": "mm: fix rss count leakage during migration\n\nMemory migration fills a pte with a migration entry and it doesn\u0027t\nupdate the rss counters.  Then it replaces the migration entry with the\nnew page (or the old one if migration failed).  But between these two\npasses this pte can be unmaped, or a task can fork a child and it will\nget a copy of this migration entry.  Nobody accounts for this in the rss\ncounters.\n\nThis patch properly adjust rss counters for migration entries in\nzap_pte_range() and copy_one_pte().  Thus we avoid extra atomic\noperations on the migration fast-path.\n\nSigned-off-by: Konstantin Khlebnikov \u003ckhlebnikov@openvz.org\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f21760b15dcd091e5afd38d0b97197b45f7ef2ea",
      "tree": "84dd0f9016b46630d6b67e48ff0382b78a1bc519",
      "parents": [
        "e5591307f0c1eb733d280a0b72473e01d7f88530"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Thu Jan 12 17:19:16 2012 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 12 20:13:08 2012 -0800"
      },
      "message": "thp: add tlb_remove_pmd_tlb_entry\n\nWe have tlb_remove_tlb_entry to indicate a pte tlb flush entry should be\nflushed, but not a corresponding API for pmd entry.  This isn\u0027t a\nproblem so far because THP is only for x86 currently and tlb_flush()\nunder x86 will flush entire TLB.  But this is confusion and could be\nmissed if thp is ported to other arch.\n\nAlso convert tlb-\u003eneed_flush \u003d 1 to a VM_BUG_ON(!tlb-\u003eneed_flush) in\n__tlb_remove_page() as suggested by Andrea Arcangeli.  The\n__tlb_remove_page() function is supposed to be called after\ntlb_remove_xxx_tlb_entry() and we can catch any misuse.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nReviewed-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "32aaeffbd4a7457bf2f7448b33b5946ff2a960eb",
      "tree": "faf7ad871d87176423ff9ed1d1ba4d9c688fc23f",
      "parents": [
        "208bca0860406d16398145ddd950036a737c3c9d",
        "67b84999b1a8b1af5625b1eabe92146c5eb42932"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Nov 06 19:44:47 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Nov 06 19:44:47 2011 -0800"
      },
      "message": "Merge branch \u0027modsplit-Oct31_2011\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux\n\n* \u0027modsplit-Oct31_2011\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)\n  Revert \"tracing: Include module.h in define_trace.h\"\n  irq: don\u0027t put module.h into irq.h for tracking irqgen modules.\n  bluetooth: macroize two small inlines to avoid module.h\n  ip_vs.h: fix implicit use of module_get/module_put from module.h\n  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence\n  include: replace linux/module.h with \"struct module\" wherever possible\n  include: convert various register fcns to macros to avoid include chaining\n  crypto.h: remove unused crypto_tfm_alg_modname() inline\n  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE\n  pm_runtime.h: explicitly requires notifier.h\n  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h\n  miscdevice.h: fix up implicit use of lists and types\n  stop_machine.h: fix implicit use of smp.h for smp_processor_id\n  of: fix implicit use of errno.h in include/linux/of.h\n  of_platform.h: delete needless include \u003clinux/module.h\u003e\n  acpi: remove module.h include from platform/aclinux.h\n  miscdevice.h: delete unnecessary inclusion of module.h\n  device_cgroup.h: delete needless include \u003clinux/module.h\u003e\n  net: sch_generic remove redundant use of \u003clinux/module.h\u003e\n  net: inet_timewait_sock doesnt need \u003clinux/module.h\u003e\n  ...\n\nFix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in\n - drivers/media/dvb/frontends/dibx000_common.c\n - drivers/media/video/{mt9m111.c,ov6650.c}\n - drivers/mfd/ab3550-core.c\n - include/linux/dmaengine.h\n"
    },
    {
      "commit": "70b50f94f1644e2aa7cb374819cfd93f3c28d725",
      "tree": "79198cd9a92600140827a670d1ed5eefdcd23d79",
      "parents": [
        "994c0e992522c123298b4a91b72f5e67ba2d1123"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Wed Nov 02 13:36:59 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Nov 02 16:06:57 2011 -0700"
      },
      "message": "mm: thp: tail page refcounting fix\n\nMichel while working on the working set estimation code, noticed that\ncalling get_page_unless_zero() on a random pfn_to_page(random_pfn)\nwasn\u0027t safe, if the pfn ended up being a tail page of a transparent\nhugepage under splitting by __split_huge_page_refcount().\n\nHe then found the problem could also theoretically materialize with\npage_cache_get_speculative() during the speculative radix tree lookups\nthat uses get_page_unless_zero() in SMP if the radix tree page is freed\nand reallocated and get_user_pages is called on it before\npage_cache_get_speculative has a chance to call get_page_unless_zero().\n\nSo the best way to fix the problem is to keep page_tail-\u003e_count zero at\nall times.  This will guarantee that get_page_unless_zero() can never\nsucceed on any tail page.  page_tail-\u003e_mapcount is guaranteed zero and\nis unused for all tail pages of a compound page, so we can simply\naccount the tail page references there and transfer them to\ntail_page-\u003e_count in __split_huge_page_refcount() (in addition to the\nhead_page-\u003e_mapcount).\n\nWhile debugging this s/_count/_mapcount/ change I also noticed get_page is\ncalled by direct-io.c on pages returned by get_user_pages.  That wasn\u0027t\nentirely safe because the two atomic_inc in get_page weren\u0027t atomic.  As\nopposed to other get_user_page users like secondary-MMU page fault to\nestablish the shadow pagetables would never call any superflous get_page\nafter get_user_page returns.  It\u0027s safer to make get_page universally safe\nfor tail pages and to use get_page_foll() within follow_page (inside\nget_user_pages()).  get_page_foll() is safe to do the refcounting for tail\npages without taking any locks because it is run within PT lock protected\ncritical sections (PT lock for pte and page_table_lock for\npmd_trans_huge).\n\nThe standard get_page() as invoked by direct-io instead will now take\nthe compound_lock but still only for tail pages.  The direct-io paths\nare usually I/O bound and the compound_lock is per THP so very\nfinegrined, so there\u0027s no risk of scalability issues with it.  A simple\ndirect-io benchmarks with all lockdep prove locking and spinlock\ndebugging infrastructure enabled shows identical performance and no\noverhead.  So it\u0027s worth it.  Ideally direct-io should stop calling\nget_page() on pages returned by get_user_pages().  The spinlock in\nget_page() is already optimized away for no-THP builds but doing\nget_page() on tail pages returned by GUP is generally a rare operation\nand usually only run in I/O paths.\n\nThis new refcounting on page_tail-\u003e_mapcount in addition to avoiding new\nRCU critical sections will also allow the working set estimation code to\nwork without any further complexity associated to the tail page\nrefcounting with THP.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nReported-by: Michel Lespinasse \u003cwalken@google.com\u003e\nReviewed-by: Michel Lespinasse \u003cwalken@google.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: David Gibson \u003cdavid@gibson.dropbear.id.au\u003e\nCc: \u003cstable@kernel.org\u003e\nCc: \u003cstable@vger.kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b95f1b31b75588306e32b2afd32166cad48f670b",
      "tree": "b5496144e41b117cfe5ae70b145b5351709ec4d0",
      "parents": [
        "b9e15bafdf1aa20791cdefdcbf1ccf7d7aa03aaa"
      ],
      "author": {
        "name": "Paul Gortmaker",
        "email": "paul.gortmaker@windriver.com",
        "time": "Sun Oct 16 02:01:52 2011 -0400"
      },
      "committer": {
        "name": "Paul Gortmaker",
        "email": "paul.gortmaker@windriver.com",
        "time": "Mon Oct 31 09:20:12 2011 -0400"
      },
      "message": "mm: Map most files to use export.h instead of module.h\n\nThe files changed within are only using the EXPORT_SYMBOL\nmacro variants.  They are not using core modular infrastructure\nand hence don\u0027t need module.h but only the export.h header.\n\nSigned-off-by: Paul Gortmaker \u003cpaul.gortmaker@windriver.com\u003e\n"
    },
    {
      "commit": "2efaca927f5cd7ecd0f1554b8f9b6a9a2c329c03",
      "tree": "1bea042a7c712e861d7734db59b3311375c439c3",
      "parents": [
        "72c4783210f77fd743f0a316858d33f27db51e7c"
      ],
      "author": {
        "name": "Benjamin Herrenschmidt",
        "email": "benh@kernel.crashing.org",
        "time": "Mon Jul 25 17:12:32 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jul 25 20:57:11 2011 -0700"
      },
      "message": "mm/futex: fix futex writes on archs with SW tracking of dirty \u0026 young\n\nI haven\u0027t reproduced it myself but the fail scenario is that on such\nmachines (notably ARM and some embedded powerpc), if you manage to hit\nthat futex path on a writable page whose dirty bit has gone from the PTE,\nyou\u0027ll livelock inside the kernel from what I can tell.\n\nIt will go in a loop of trying the atomic access, failing, trying gup to\n\"fix it up\", getting succcess from gup, go back to the atomic access,\nfailing again because dirty wasn\u0027t fixed etc...\n\nSo I think you essentially hang in the kernel.\n\nThe scenario is probably rare\u0027ish because affected architecture are\nembedded and tend to not swap much (if at all) so we probably rarely hit\nthe case where dirty is missing or young is missing, but I think Shan has\na piece of SW that can reliably reproduce it using a shared writable\nmapping \u0026 fork or something like that.\n\nOn archs who use SW tracking of dirty \u0026 young, a page without dirty is\neffectively mapped read-only and a page without young unaccessible in the\nPTE.\n\nAdditionally, some architectures might lazily flush the TLB when relaxing\nwrite protection (by doing only a local flush), and expect a fault to\ninvalidate the stale entry if it\u0027s still present on another processor.\n\nThe futex code assumes that if the \"in_atomic()\" access -EFAULT\u0027s, it can\n\"fix it up\" by causing get_user_pages() which would then be equivalent to\ntaking the fault.\n\nHowever that isn\u0027t the case.  get_user_pages() will not call\nhandle_mm_fault() in the case where the PTE seems to have the right\npermissions, regardless of the dirty and young state.  It will eventually\nupdate those bits ...  in the struct page, but not in the PTE.\n\nAdditionally, it will not handle the lazy TLB flushing that can be\nrequired by some architectures in the fault case.\n\nBasically, gup is the wrong interface for the job.  The patch provides a\nmore appropriate one which boils down to just calling handle_mm_fault()\nsince what we are trying to do is simulate a real page fault.\n\nThe futex code currently attempts to write to user memory within a\npagefault disabled section, and if that fails, tries to fix it up using\nget_user_pages().\n\nThis doesn\u0027t work on archs where the dirty and young bits are maintained\nby software, since they will gate access permission in the TLB, and will\nnot be updated by gup().\n\nIn addition, there\u0027s an expectation on some archs that a spurious write\nfault triggers a local TLB flush, and that is missing from the picture as\nwell.\n\nI decided that adding those \"features\" to gup() would be too much for this\nalready too complex function, and instead added a new simpler\nfixup_user_fault() which is essentially a wrapper around handle_mm_fault()\nwhich the futex code can call.\n\n[akpm@linux-foundation.org: coding-style fixes]\n[akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]\nSigned-off-by: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nReported-by: Shan Hai \u003chaishan.bai@gmail.com\u003e\nTested-by: Shan Hai \u003chaishan.bai@gmail.com\u003e\nCc: David Laight \u003cDavid.Laight@ACULAB.COM\u003e\nAcked-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Darren Hart \u003cdarren.hart@intel.com\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1d65f86db14806cf7b1218c7b4ecb8b4db5af27d",
      "tree": "01a2c4e3feb48327220b1fd8d09cf805c20eee7f",
      "parents": [
        "d515afe88a32e567c550e3db914f3e378f86453a"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Mon Jul 25 17:12:27 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jul 25 20:57:10 2011 -0700"
      },
      "message": "mm: preallocate page before lock_page() at filemap COW\n\nCurrently we are keeping faulted page locked throughout whole __do_fault\ncall (except for page_mkwrite code path) after calling file system\u0027s fault\ncode.  If we do early COW, we allocate a new page which has to be charged\nfor a memcg (mem_cgroup_newpage_charge).\n\nThis function, however, might block for unbounded amount of time if memcg\noom killer is disabled or fork-bomb is running because the only way out of\nthe OOM situation is either an external event or OOM-situation fix.\n\nIn the end we are keeping the faulted page locked and blocking other\nprocesses from faulting it in which is not good at all because we are\nbasically punishing potentially an unrelated process for OOM condition in\na different group (I have seen stuck system because of ld-2.11.1.so being\nlocked).\n\nWe can do test easily.\n\n % cgcreate -g memory:A\n % cgset -r memory.limit_in_bytes\u003d64M A\n % cgset -r memory.memsw.limit_in_bytes\u003d64M A\n % cd kernel_dir; cgexec -g memory:A make -j\n\nThen, the whole system will live-locked until you kill \u0027make -j\u0027\nby hands (or push reboot...) This is because some important page in a\na shared library are locked.\n\nConsidering again, the new page is not necessary to be allocated\nwith lock_page() held. And usual page allocation may dive into\nlong memory reclaim loop with holding lock_page() and can cause\nvery long latency.\n\nThere are 3 ways.\n  1. do allocation/charge before lock_page()\n     Pros. - simple and can handle page allocation in the same manner.\n             This will reduce holding time of lock_page() in general.\n     Cons. - we do page allocation even if -\u003efault() returns error.\n\n  2. do charge after unlock_page(). Even if charge fails, it\u0027s just OOM.\n     Pros. - no impact to non-memcg path.\n     Cons. - implemenation requires special cares of LRU and we need to modify\n             page_add_new_anon_rmap()...\n\n  3. do unlock-\u003echarge-\u003elock again method.\n     Pros. - no impact to non-memcg path.\n     Cons. - This may kill LOCK_PAGE_RETRY optimization. We need to release\n             lock and get it again...\n\nThis patch moves \"charge\" and memory allocation for COW page\nbefore lock_page(). Then, we can avoid scanning LRU with holding\na lock on a page and latency under lock_page() will be reduced.\n\nThen, above livelock disappears.\n\n[akpm@linux-foundation.org: fix code layout]\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReported-by: Lutz Vieweg \u003clvml@5t9.de\u003e\nOriginal-idea-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "6ac47520063b230641a64062b8a229201cd0a3a8",
      "tree": "e26a25226f980a50468f001bbd3243d74d0d9768",
      "parents": [
        "32f84528fbb5177275193a3311be8756f0cbd62c"
      ],
      "author": {
        "name": "Andrew Morton",
        "email": "akpm@linux-foundation.org",
        "time": "Mon Jul 25 17:12:16 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jul 25 20:57:09 2011 -0700"
      },
      "message": "mm/memory.c: remove ZAP_BLOCK_SIZE\n\nZAP_BLOCK_SIZE became unused in the preemptible-mmu_gather work (\"mm:\nRemove i_mmap_lock lockbreak\").  So zap it.\n\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0b43c3aab0137595335b08b340a3f3e5af9818a6",
      "tree": "dad2556800b89d42875470744a25533d1d983989",
      "parents": [
        "215ddd6664ced067afca7eebd2d1eb83f064ff5a"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Fri Jul 08 15:39:41 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Jul 08 21:14:43 2011 -0700"
      },
      "message": "mm: __tlb_remove_page() check the correct batch\n\n__tlb_remove_page() switches to a new batch page, but still checks space\nin the old batch.  This check always fails, and causes a forced tlb flush.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nAcked-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5b8ba10198a109f8a02380648c5d29000caa9c55",
      "tree": "1e4328d86395baa3d429c0d9911b7d7e1272629d",
      "parents": [
        "4d258b25d947521c8b913154db61ec55198243f8"
      ],
      "author": {
        "name": "Hugh Dickins",
        "email": "hughd@google.com",
        "time": "Mon Jun 27 16:18:01 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jun 27 18:00:12 2011 -0700"
      },
      "message": "mm: move vmtruncate_range to truncate.c\n\nYou would expect to find vmtruncate_range() next to vmtruncate() in\nmm/truncate.c: move it there.\n\nSigned-off-by: Hugh Dickins \u003chughd@google.com\u003e\nAcked-by: Christoph Hellwig \u003chch@infradead.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5f1a19070b16c20cdc71ed0e981bfa19f8f6a4ee",
      "tree": "f3eaeb7a040e2484d71485118d58e34eb0760bf3",
      "parents": [
        "4bbd61fb9726808e72ab2aa440401f6e5e1aa8f7"
      ],
      "author": {
        "name": "Steven Rostedt",
        "email": "rostedt@goodmis.org",
        "time": "Wed Jun 15 15:08:23 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Jun 15 20:04:00 2011 -0700"
      },
      "message": "mm: fix wrong kunmap_atomic() pointer\n\nRunning a ktest.pl test, I hit the following bug on x86_32:\n\n  ------------[ cut here ]------------\n  WARNING: at arch/x86/mm/highmem_32.c:81 __kunmap_atomic+0x64/0xc1()\n   Hardware name:\n  Modules linked in:\n  Pid: 93, comm: sh Not tainted 2.6.39-test+ #1\n  Call Trace:\n   [\u003cc04450da\u003e] warn_slowpath_common+0x7c/0x91\n   [\u003cc042f5df\u003e] ? __kunmap_atomic+0x64/0xc1\n   [\u003cc042f5df\u003e] ? __kunmap_atomic+0x64/0xc1^M\n   [\u003cc0445111\u003e] warn_slowpath_null+0x22/0x24\n   [\u003cc042f5df\u003e] __kunmap_atomic+0x64/0xc1\n   [\u003cc04d4a22\u003e] unmap_vmas+0x43a/0x4e0\n   [\u003cc04d9065\u003e] exit_mmap+0x91/0xd2\n   [\u003cc0443057\u003e] mmput+0x43/0xad\n   [\u003cc0448358\u003e] exit_mm+0x111/0x119\n   [\u003cc044855f\u003e] do_exit+0x1ff/0x5fa\n   [\u003cc0454ea2\u003e] ? set_current_blocked+0x3c/0x40\n   [\u003cc0454f24\u003e] ? sigprocmask+0x7e/0x8e\n   [\u003cc0448b55\u003e] do_group_exit+0x65/0x88\n   [\u003cc0448b90\u003e] sys_exit_group+0x18/0x1c\n   [\u003cc0c3915f\u003e] sysenter_do_call+0x12/0x38\n  ---[ end trace 8055f74ea3c0eb62 ]---\n\nRunning a ktest.pl git bisect, found the culprit: commit e303297e6c3a\n(\"mm: extended batches for generic mmu_gather\")\n\nBut although this was the commit triggering the bug, it was not the one\noriginally responsible for the bug.  That was commit d16dfc550f53 (\"mm:\nmmu_gather rework\").\n\nThe code in zap_pte_range() has something that looks like the following:\n\n\tpte \u003d  pte_offset_map_lock(mm, pmd, addr, \u0026ptl);\n\tdo {\n\t\t[...]\n\t} while (pte++, addr +\u003d PAGE_SIZE, addr !\u003d end);\n\tpte_unmap_unlock(pte - 1, ptl);\n\nThe pte starts off pointing at the first element in the page table\ndirectory that was returned by the pte_offset_map_lock().  When it\u0027s done\nwith the page, pte will be pointing to anything between the next entry and\nthe first entry of the next page inclusive.  By doing a pte - 1, this puts\nthe pte back onto the original page, which is all that pte_unmap_unlock()\nneeds.\n\nIn most archs (64 bit), this is not an issue as the pte is ignored in the\npte_unmap_unlock().  But on 32 bit archs, where things may be kmapped, it\nis essential that the pte passed to pte_unmap_unlock() resides on the same\npage that was given by pte_offest_map_lock().\n\nThe problem came in d16dfc55 (\"mm: mmu_gather rework\") where it introduced\na \"break;\" from the while loop.  This alone did not seem to easily trigger\nthe bug.  But the modifications made by e303297e6 caused that \"break;\" to\nbe hit on the first iteration, before the pte++.\n\nThe pte not being incremented will now cause pte_unmap_unlock(pte - 1) to\nbe pointing to the previous page.  This will cause the wrong page to be\nunmapped, and also trigger the warning above.\n\nThe simple solution is to just save the pointer given by\npte_offset_map_lock() and use it in the unlock.\n\nSigned-off-by: Steven Rostedt \u003crostedt@goodmis.org\u003e\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0164f69d0cf1a6abbc936851f5b72ece92187cda",
      "tree": "000bb234b98d76ce0b5195a3ee53a505aa0d3d86",
      "parents": [
        "f300ea499721ca208fc4714b9105bfd7e9f75be0"
      ],
      "author": {
        "name": "Randy Dunlap",
        "email": "randy.dunlap@oracle.com",
        "time": "Wed Jun 15 15:08:09 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Jun 15 20:03:59 2011 -0700"
      },
      "message": "mm/memory.c: fix kernel-doc notation\n\nFix new kernel-doc warnings in mm/memory.c:\n\n  Warning(mm/memory.c:1327): No description found for parameter \u0027tlb\u0027\n  Warning(mm/memory.c:1327): Excess function parameter \u0027tlbp\u0027 description in \u0027unmap_vmas\u0027\n\nSigned-off-by: Randy Dunlap \u003crandy.dunlap@oracle.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "456f998ec817ebfa254464be4f089542fa390645",
      "tree": "5976aa500638f0bbade1a672233cad71765b89b8",
      "parents": [
        "406eb0c9ba765eb066406fd5ce9d5e2b169a4d5a"
      ],
      "author": {
        "name": "Ying Han",
        "email": "yinghan@google.com",
        "time": "Thu May 26 16:25:38 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 17:12:36 2011 -0700"
      },
      "message": "memcg: add the pagefault count into memcg stats\n\nTwo new stats in per-memcg memory.stat which tracks the number of page\nfaults and number of major page faults.\n\n  \"pgfault\"\n  \"pgmajfault\"\n\nThey are different from \"pgpgin\"/\"pgpgout\" stat which count number of\npages charged/discharged to the cgroup and have no meaning of reading/\nwriting page to disk.\n\nIt is valuable to track the two stats for both measuring application\u0027s\nperformance as well as the efficiency of the kernel page reclaim path.\nCounting pagefaults per process is useful, but we also need the aggregated\nvalue since processes are monitored and controlled in cgroup basis in\nmemcg.\n\nFunctional test: check the total number of pgfault/pgmajfault of all\nmemcgs and compare with global vmstat value:\n\n  $ cat /proc/vmstat | grep fault\n  pgfault 1070751\n  pgmajfault 553\n\n  $ cat /dev/cgroup/memory.stat | grep fault\n  pgfault 1071138\n  pgmajfault 553\n  total_pgfault 1071142\n  total_pgmajfault 553\n\n  $ cat /dev/cgroup/A/memory.stat | grep fault\n  pgfault 199\n  pgmajfault 0\n  total_pgfault 199\n  total_pgmajfault 0\n\nPerformance test: run page fault test(pft) wit 16 thread on faulting in\n15G anon pages in 16G container.  There is no regression noticed on the\n\"flt/cpu/s\"\n\nSample output from pft:\n\n  TAG pft:anon-sys-default:\n    Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec\n    15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260\n\n  +-------------------------------------------------------------------------+\n      N           Min           Max        Median           Avg        Stddev\n  x  10     16682.962     17344.027     16913.524     16928.812      166.5362\n  +  10     16695.568     16923.896     16820.604     16824.652     84.816568\n  No difference proven at 95.0% confidence\n\n[akpm@linux-foundation.org: fix build]\n[hughd@google.com: shmem fix]\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nAcked-by: Balbir Singh \u003cbalbir@linux.vnet.ibm.com\u003e\nSigned-off-by: Hugh Dickins \u003chughd@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ca16d140af91febe25daeb9e032bf8bd46b8c31f",
      "tree": "a093c3f244a1bdfc2a50e271a7e6df3324df0f05",
      "parents": [
        "4db70f73e56961b9bcdfd0c36c62847a18b7dbb5"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Thu May 26 19:16:19 2011 +0900"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 09:20:31 2011 -0700"
      },
      "message": "mm: don\u0027t access vm_flags as \u0027int\u0027\n\nThe type of vma-\u003evm_flags is \u0027unsigned long\u0027. Neither \u0027int\u0027 nor\n\u0027unsigned int\u0027. This patch fixes such misuse.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\n[ Changed to use a typedef - we\u0027ll extend it to cover more cases\n  later, since there has been discussion about making it a 64-bit\n  type..                      - Linus ]\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "9547d01bfb9c351dc19067f8a4cea9d3955f4125",
      "tree": "3c32521dbbf380471e1eef3e11ae656b24164255",
      "parents": [
        "88c22088bf235f50b09a10bd9f022b0472bcb6b5"
      ],
      "author": {
        "name": "Peter Zijlstra",
        "email": "a.p.zijlstra@chello.nl",
        "time": "Tue May 24 17:12:14 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:20 2011 -0700"
      },
      "message": "mm: uninline large generic tlb.h functions\n\nSome of these functions have grown beyond inline sanity, move them\nout-of-line.\n\nSigned-off-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nRequested-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nRequested-by: Hugh Dickins \u003chughd@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3d48ae45e72390ddf8cc5256ac32ed6f7a19cbea",
      "tree": "1f46db3a8424090dd8e0b58991fa5acc1a73e680",
      "parents": [
        "97a894136f29802da19a15541de3c019e1ca147e"
      ],
      "author": {
        "name": "Peter Zijlstra",
        "email": "a.p.zijlstra@chello.nl",
        "time": "Tue May 24 17:12:06 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:18 2011 -0700"
      },
      "message": "mm: Convert i_mmap_lock to a mutex\n\nStraightforward conversion of i_mmap_lock to a mutex.\n\nSigned-off-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: David Miller \u003cdavem@davemloft.net\u003e\nCc: Martin Schwidefsky \u003cschwidefsky@de.ibm.com\u003e\nCc: Russell King \u003crmk@arm.linux.org.uk\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nCc: Jeff Dike \u003cjdike@addtoit.com\u003e\nCc: Richard Weinberger \u003crichard@nod.at\u003e\nCc: Tony Luck \u003ctony.luck@intel.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "97a894136f29802da19a15541de3c019e1ca147e",
      "tree": "1fd3f92ba92a37d5d8527a1f41458091d0a944dc",
      "parents": [
        "e4c70a6629f9c74c4b0de258a3951890e9047c82"
      ],
      "author": {
        "name": "Peter Zijlstra",
        "email": "a.p.zijlstra@chello.nl",
        "time": "Tue May 24 17:12:04 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:17 2011 -0700"
      },
      "message": "mm: Remove i_mmap_lock lockbreak\n\nHugh says:\n \"The only significant loser, I think, would be page reclaim (when\n  concurrent with truncation): could spin for a long time waiting for\n  the i_mmap_mutex it expects would soon be dropped? \"\n\nCounter points:\n - cpu contention makes the spin stop (need_resched())\n - zap pages should be freeing pages at a higher rate than reclaim\n   ever can\n\nI think the simplification of the truncate code is definitely worth it.\n\nEffectively reverts: 2aa15890f3c (\"mm: prevent concurrent\nunmap_mapping_range() on the same inode\") and takes out the code that\ncaused its problem.\n\nSigned-off-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: David Miller \u003cdavem@davemloft.net\u003e\nCc: Martin Schwidefsky \u003cschwidefsky@de.ibm.com\u003e\nCc: Russell King \u003crmk@arm.linux.org.uk\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nCc: Jeff Dike \u003cjdike@addtoit.com\u003e\nCc: Richard Weinberger \u003crichard@nod.at\u003e\nCc: Tony Luck \u003ctony.luck@intel.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e303297e6c3a7b847c4731eb14006ca6b435ecca",
      "tree": "c2bbec8fb0cad1405f4a3ff908cd1d22abcd3e77",
      "parents": [
        "267239116987d64850ad2037d8e0f3071dc3b5ce"
      ],
      "author": {
        "name": "Peter Zijlstra",
        "email": "a.p.zijlstra@chello.nl",
        "time": "Tue May 24 17:12:01 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:16 2011 -0700"
      },
      "message": "mm: extended batches for generic mmu_gather\n\nInstead of using a single batch (the small on-stack, or an allocated\npage), try and extend the batch every time it runs out and only flush once\neither the extend fails or we\u0027re done.\n\nSigned-off-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nRequested-by: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: David Miller \u003cdavem@davemloft.net\u003e\nCc: Martin Schwidefsky \u003cschwidefsky@de.ibm.com\u003e\nCc: Russell King \u003crmk@arm.linux.org.uk\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nCc: Jeff Dike \u003cjdike@addtoit.com\u003e\nCc: Richard Weinberger \u003crichard@nod.at\u003e\nCc: Tony Luck \u003ctony.luck@intel.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "267239116987d64850ad2037d8e0f3071dc3b5ce",
      "tree": "142595897f7fc7bb673b791891dcc2fab31f6e91",
      "parents": [
        "1c395176962176660bb108f90e97e1686cfe0d85"
      ],
      "author": {
        "name": "Peter Zijlstra",
        "email": "a.p.zijlstra@chello.nl",
        "time": "Tue May 24 17:12:00 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:16 2011 -0700"
      },
      "message": "mm, powerpc: move the RCU page-table freeing into generic code\n\nIn case other architectures require RCU freed page-tables to implement\ngup_fast() and software filled hashes and similar things, provide the\nmeans to do so by moving the logic into generic code.\n\nSigned-off-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nRequested-by: David Miller \u003cdavem@davemloft.net\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: Martin Schwidefsky \u003cschwidefsky@de.ibm.com\u003e\nCc: Russell King \u003crmk@arm.linux.org.uk\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nCc: Jeff Dike \u003cjdike@addtoit.com\u003e\nCc: Richard Weinberger \u003crichard@nod.at\u003e\nCc: Tony Luck \u003ctony.luck@intel.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d16dfc550f5326a4000f3322582a7c05dec91d7a",
      "tree": "8ee963542705cbf2187777f1d3f2b209cbda827a",
      "parents": [
        "d05f3169c0fbca16132ec7c2be71685c6de638b5"
      ],
      "author": {
        "name": "Peter Zijlstra",
        "email": "a.p.zijlstra@chello.nl",
        "time": "Tue May 24 17:11:45 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:12 2011 -0700"
      },
      "message": "mm: mmu_gather rework\n\nRework the existing mmu_gather infrastructure.\n\nThe direct purpose of these patches was to allow preemptible mmu_gather,\nbut even without that I think these patches provide an improvement to the\nstatus quo.\n\nThe first 9 patches rework the mmu_gather infrastructure.  For review\npurpose I\u0027ve split them into generic and per-arch patches with the last of\nthose a generic cleanup.\n\nThe next patch provides generic RCU page-table freeing, and the followup\nis a patch converting s390 to use this.  I\u0027ve also got 4 patches from\nDaveM lined up (not included in this series) that uses this to implement\ngup_fast() for sparc64.\n\nThen there is one patch that extends the generic mmu_gather batching.\n\nAfter that follow the mm preemptibility patches, these make part of the mm\na lot more preemptible.  It converts i_mmap_lock and anon_vma-\u003elock to\nmutexes which together with the mmu_gather rework makes mmu_gather\npreemptible as well.\n\nMaking i_mmap_lock a mutex also enables a clean-up of the truncate code.\n\nThis also allows for preemptible mmu_notifiers, something that XPMEM I\nthink wants.\n\nFurthermore, it removes the new and universially detested unmap_mutex.\n\nThis patch:\n\nRemove the first obstacle towards a fully preemptible mmu_gather.\n\nThe current scheme assumes mmu_gather is always done with preemption\ndisabled and uses per-cpu storage for the page batches.  Change this to\ntry and allocate a page for batching and in case of failure, use a small\non-stack array to make some progress.\n\nPreemptible mmu_gather is desired in general and usable once i_mmap_lock\nbecomes a mutex.  Doing it before the mutex conversion saves us from\nhaving to rework the code by moving the mmu_gather bits inside the\npte_lock.\n\nAlso avoid flushing the tlb batches from under the pte lock, this is\nuseful even without the i_mmap_lock conversion as it significantly reduces\npte lock hold times.\n\n[akpm@linux-foundation.org: fix comment tpyo]\nSigned-off-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: David Miller \u003cdavem@davemloft.net\u003e\nCc: Martin Schwidefsky \u003cschwidefsky@de.ibm.com\u003e\nCc: Russell King \u003crmk@arm.linux.org.uk\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nCc: Jeff Dike \u003cjdike@addtoit.com\u003e\nCc: Richard Weinberger \u003crichard@nod.at\u003e\nCc: Tony Luck \u003ctony.luck@intel.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d05f3169c0fbca16132ec7c2be71685c6de638b5",
      "tree": "37d82004869fa4e530617883f12cab7538dbd4a6",
      "parents": [
        "248ac0e1943ad1796393d281b096184719eb3f97"
      ],
      "author": {
        "name": "Michal Hocko",
        "email": "mhocko@suse.cz",
        "time": "Tue May 24 17:11:44 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:12 2011 -0700"
      },
      "message": "mm: make expand_downwards() symmetrical with expand_upwards()\n\nCurrently we have expand_upwards exported while expand_downwards is\naccessible only via expand_stack or expand_stack_downwards.\n\ncheck_stack_guard_page is a nice example of the asymmetry.  It uses\nexpand_stack for VM_GROWSDOWN while expand_upwards is called for\nVM_GROWSUP case.\n\nLet\u0027s clean this up by exporting both functions and make those names\nconsistent.  Let\u0027s use expand_{upwards,downwards} because expanding\ndoesn\u0027t always involve stack manipulation (an example is\nia64_do_page_fault which uses expand_upwards for registers backing store\nexpansion).  expand_downwards has to be defined for both\nCONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards\nversion in the early process initialization phase for growsup\nconfiguration.\n\nSigned-off-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: James Bottomley \u003cJames.Bottomley@HansenPartnership.com\u003e\nCc: \"Luck, Tony\" \u003ctony.luck@intel.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a09a79f66874c905af35d5bb5e5f2fdc7b6b894d",
      "tree": "9cb2ae1fef7083af91a49c19411e9871e0e59a37",
      "parents": [
        "26822eebb25500fb0776c7c256a6af041e9f538b"
      ],
      "author": {
        "name": "Mikulas Patocka",
        "email": "mikulas@artax.karlin.mff.cuni.cz",
        "time": "Mon May 09 13:01:09 2011 +0200"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon May 09 16:22:07 2011 -0700"
      },
      "message": "Don\u0027t lock guardpage if the stack is growing up\n\nLinux kernel excludes guard page when performing mlock on a VMA with\ndown-growing stack. However, some architectures have up-growing stack\nand locking the guard page should be excluded in this case too.\n\nThis patch fixes lvm2 on PA-RISC (and possibly other architectures with\nup-growing stack). lvm2 calculates number of used pages when locking and\nwhen unlocking and reports an internal error if the numbers mismatch.\n\n[ Patch changed fairly extensively to also fix /proc/\u003cpid\u003e/maps for the\n  grows-up case, and to move things around a bit to clean it all up and\n  share the infrstructure with the /proc bits.\n\n  Tested on ia64 that has both grow-up and grow-down segments  - Linus ]\n\nSigned-off-by: Mikulas Patocka \u003cmikulas@artax.karlin.mff.cuni.cz\u003e\nTested-by: Tony Luck \u003ctony.luck@gmail.com\u003e\nCc: stable@kernel.org\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a1fde08c74e90accd62d4cfdbf580d2ede938fe7",
      "tree": "bdf58078fd37484729e350acb066dc1b1fa890ee",
      "parents": [
        "5895198c56d131cc696556a45f7ff0ea99ac297b"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 04 21:30:28 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 04 21:30:28 2011 -0700"
      },
      "message": "VM: skip the stack guard page lookup in get_user_pages only for mlock\n\nThe logic in __get_user_pages() used to skip the stack guard page lookup\nwhenever the caller wasn\u0027t interested in seeing what the actual page\nwas.  But Michel Lespinasse points out that there are cases where we\ndon\u0027t care about the physical page itself (so \u0027pages\u0027 may be NULL), but\ndo want to make sure a page is mapped into the virtual address space.\n\nSo using the existence of the \"pages\" array as an indication of whether\nto look up the guard page or not isn\u0027t actually so great, and we really\nshould just use the FOLL_MLOCK bit.  But because that bit was only set\nfor the VM_LOCKED case (and not all vma\u0027s necessarily have it, even for\nmlock()), we couldn\u0027t do that originally.\n\nFix that by moving the VM_LOCKED check deeper into the call-chain, which\nactually simplifies many things.  Now mlock() gets simpler, and we can\nalso check for FOLL_MLOCK in __get_user_pages() and the code ends up\nmuch more straightforward.\n\nReported-and-reviewed-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: stable@kernel.org\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "cc03638df20acbec5d0d0d9e07234aadde9e698d",
      "tree": "462baff7982f3b58a647fc895ec3a62402e3d0b3",
      "parents": [
        "1409f141ac719b994d2832911b1e9ec928943fc2"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Wed Apr 27 15:26:56 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Apr 28 11:28:21 2011 -0700"
      },
      "message": "mm: check if PTE is already allocated during page fault\n\nWith transparent hugepage support, handle_mm_fault() has to be careful\nthat a normal PMD has been established before handling a PTE fault.  To\nachieve this, it used __pte_alloc() directly instead of pte_alloc_map as\npte_alloc_map is unsafe to run against a huge PMD.  pte_offset_map() is\ncalled once it is known the PMD is safe.\n\npte_alloc_map() is smart enough to check if a PTE is already present\nbefore calling __pte_alloc but this check was lost.  As a consequence,\nPTEs may be allocated unnecessarily and the page table lock taken.  Thi\nuseless PTE does get cleaned up but it\u0027s a performance hit which is\nvisible in page_test from aim9.\n\nThis patch simply re-adds the check normally done by pte_alloc_map to\ncheck if the PTE needs to be allocated before taking the page table lock.\nThe effect is noticable in page_test from aim9.\n\n  AIM9\n                  2.6.38-vanilla 2.6.38-checkptenone\n  creat-clo      446.10 ( 0.00%)   424.47 (-5.10%)\n  page_test       38.10 ( 0.00%)    42.04 ( 9.37%)\n  brk_test        52.45 ( 0.00%)    51.57 (-1.71%)\n  exec_test      382.00 ( 0.00%)   456.90 (16.39%)\n  fork_test       60.11 ( 0.00%)    67.79 (11.34%)\n  MMTests Statistics: duration\n  Total Elapsed Time (seconds)                611.90    612.22\n\n(While this affects 2.6.38, it is a performance rather than a functional\nbug and normally outside the rules -stable.  While the big performance\ndifferences are to a microbench, the difference in fork and exec\nperformance may be significant enough that -stable wants to consider the\npatch)\n\nReported-by: Raz Ben Yehuda \u003craziebe@gmail.com\u003e\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.38.x]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "fe936dfc23fed3475b11067e8d9b70553eafcd9e",
      "tree": "b45ad916853194b26bfe4504879e0bff64a43bf7",
      "parents": [
        "4471a675dfc7ca676c165079e91c712b09dc9ce4"
      ],
      "author": {
        "name": "Michael Ellerman",
        "email": "michael@ellerman.id.au",
        "time": "Thu Apr 14 15:22:10 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Apr 14 16:06:55 2011 -0700"
      },
      "message": "mm: check that we have the right vma in __access_remote_vm()\n\nIn __access_remote_vm() we need to check that we have found the right\nvma, not the following vma before we try to access it.  Otherwise we\nmight call the vma\u0027s access routine with an address which does not fall\ninside the vma.\n\nIt was discovered on a current kernel but with an unreleased driver,\nfrom memory it was strace leading to a kernel bad access, but it\nobviously depends on what the access implementation does.\n\nLooking at other access implementations I only see:\n\n  $ git grep -A 5 vm_operations|grep access\n  arch/powerpc/platforms/cell/spufs/file.c-\t.access \u003d spufs_mem_mmap_access,\n  arch/x86/pci/i386.c-\t.access \u003d generic_access_phys,\n  drivers/char/mem.c-\t.access \u003d generic_access_phys\n  fs/sysfs/bin.c-\t.access\t\t\u003d bin_access,\n\nThe spufs one looks like it might behave badly given the wrong vma, it\nassumes vma-\u003evm_file-\u003eprivate_data is a spu_context, and looks like it\nwould probably blow up pretty quickly if it wasn\u0027t.\n\ngeneric_access_phys() only uses the vma to check vm_flags and get the\nmm, and then walks page tables using the address.  So it should bail on\nthe vm_flags check, or at worst let you access some other VM_IO mapping.\n\nAnd bin_access() just proxies to another access implementation.\n\nSigned-off-by: Michael Ellerman \u003cmichael@ellerman.id.au\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "95042f9eb78a8d9a17455e2ef263f2f310ecef15",
      "tree": "ac9fe0a5e17c4b94b18b84338ffbeca2cee140cb",
      "parents": [
        "be85bccaa5aa5a11dcaf85f9e945ffefd253f631"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Apr 12 14:15:51 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Apr 12 14:15:51 2011 -0700"
      },
      "message": "vm: fix mlock() on stack guard page\n\nCommit 53a7706d5ed8 (\"mlock: do not hold mmap_sem for extended periods\nof time\") changed mlock() to care about the exact number of pages that\n__get_user_pages() had brought it.  Before, it would only care about\nerrors.\n\nAnd that doesn\u0027t work, because we also handled one page specially in\n__mlock_vma_pages_range(), namely the stack guard page.  So when that\ncase was handled, the number of pages that the function returned was off\nby one.  In particular, it could be zero, and then the caller would end\nup not making any progress at all.\n\nRather than try to fix up that off-by-one error for the mlock case\nspecially, this just moves the logic to handle the stack guard page\ninto__get_user_pages() itself, thus making all the counts come out\nright automatically.\n\nReported-by: Robert Święcki \u003crobert@swiecki.net\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Oleg Nesterov \u003coleg@redhat.com\u003e\nCc: stable@kernel.org\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ae91dbfc9949cf042c45798557b48d3b83bc3635",
      "tree": "6af0edfd904b957a2f6ca65ae4a5fdebb78ca5b8",
      "parents": [
        "d7c3f8cee81f4548de0513403b74131aee655576"
      ],
      "author": {
        "name": "Randy Dunlap",
        "email": "randy.dunlap@oracle.com",
        "time": "Sat Mar 26 13:27:01 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Mar 27 19:30:18 2011 -0700"
      },
      "message": "mm: fix memory.c incorrect kernel-doc\n\nFix mm/memory.c incorrect kernel-doc function notation:\n\n  Warning(mm/memory.c:3718): Cannot understand  * @access_remote_vm - access another process\u0027 address space\n   on line 3718 - I thought it was a doc line\n\nSigned-off-by: Randy Dunlap \u003crandy.dunlap@oracle.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b81a618dcd3ea99de292dbe624f41ca68f464376",
      "tree": "c5fbe44f944da9d7dc0c224116be77094d379c8a",
      "parents": [
        "2f284c846331fa44be1300a3c2c3e85800268a00",
        "a9712bc12c40c172e393f85a9b2ba8db4bf59509"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 23 20:51:42 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 23 20:51:42 2011 -0700"
      },
      "message": "Merge branch \u0027for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6\n\n* \u0027for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:\n  deal with races in /proc/*/{syscall,stack,personality}\n  proc: enable writing to /proc/pid/mem\n  proc: make check_mem_permission() return an mm_struct on success\n  proc: hold cred_guard_mutex in check_mem_permission()\n  proc: disable mem_write after exec\n  mm: implement access_remote_vm\n  mm: factor out main logic of access_process_vm\n  mm: use mm_struct to resolve gate vma\u0027s in __get_user_pages\n  mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm\n  mm: arch: make in_gate_area take an mm_struct instead of a task_struct\n  mm: arch: make get_gate_vma take an mm_struct instead of a task_struct\n  x86: mark associated mm when running a task in 32 bit compatibility mode\n  x86: add context tag to mark mm when running a task in 32-bit compatibility mode\n  auxv: require the target to be tracable (or yourself)\n  close race in /proc/*/environ\n  report errors in /proc/*/*map* sanely\n  pagemap: close races with suid execve\n  make sessionid permissions in /proc/*/task/* match those in /proc/*\n  fix leaks in path_lookupat()\n\nFix up trivial conflicts in fs/proc/base.c\n"
    },
    {
      "commit": "56039efa18f2530fc23e8ef19e716b65ee2a1d1e",
      "tree": "a61cbd2f760e93363657622de2cd1591db028458",
      "parents": [
        "6c191cd01a935e5b53ef43c9403c771bb7a32b60"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Wed Mar 23 16:42:19 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 23 19:46:22 2011 -0700"
      },
      "message": "memcg: fix ugly initialization of return value is in caller\n\nRemove initialization of vaiable in caller of memory cgroup function.\nActually, it\u0027s return value of memcg function but it\u0027s initialized in\ncaller.\n\nSome memory cgroup uses following style to bring the result of start\nfunction to the end function for avoiding races.\n\n   mem_cgroup_start_A(\u0026(*ptr))\n   /* Something very complicated can happen here. */\n   mem_cgroup_end_A(*ptr)\n\nIn some calls, *ptr should be initialized to NULL be caller.  But it\u0027s\nugly.  This patch fixes that *ptr is initialized by _start function.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nAcked-by: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Balbir Singh \u003cbalbir@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5ddd36b9c59887c6416e21daf984fbdd9b1818df",
      "tree": "1cc7ce9a671f4c49dc594e1f5d1fc8b596e77b5f",
      "parents": [
        "206cb636576b969e9b471cdedeaea7752e6acb33"
      ],
      "author": {
        "name": "Stephen Wilson",
        "email": "wilsons@start.ca",
        "time": "Sun Mar 13 15:49:20 2011 -0400"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Mar 23 16:36:57 2011 -0400"
      },
      "message": "mm: implement access_remote_vm\n\nProvide an alternative to access_process_vm that allows the caller to obtain a\nreference to the supplied mm_struct.\n\nSigned-off-by: Stephen Wilson \u003cwilsons@start.ca\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "206cb636576b969e9b471cdedeaea7752e6acb33",
      "tree": "252a1b5e9ce41521fb93b519265d4a1dbd18cfe9",
      "parents": [
        "e7f22e207bacdba5b73f2893a3abe935a5373e2e"
      ],
      "author": {
        "name": "Stephen Wilson",
        "email": "wilsons@start.ca",
        "time": "Sun Mar 13 15:49:19 2011 -0400"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Mar 23 16:36:56 2011 -0400"
      },
      "message": "mm: factor out main logic of access_process_vm\n\nIntroduce an internal helper __access_remote_vm and base access_process_vm on\ntop of it.  This new method may be called with a NULL task_struct if page fault\naccounting is not desired.  This code will be shared with a new address space\naccessor that is independent of task_struct.\n\nSigned-off-by: Stephen Wilson \u003cwilsons@start.ca\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "e7f22e207bacdba5b73f2893a3abe935a5373e2e",
      "tree": "02e9f01788742db409587475a0aa10f3a0347e38",
      "parents": [
        "cae5d39032acf26c265f6b1dc73d7ce6ff4bc387"
      ],
      "author": {
        "name": "Stephen Wilson",
        "email": "wilsons@start.ca",
        "time": "Sun Mar 13 15:49:18 2011 -0400"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Mar 23 16:36:56 2011 -0400"
      },
      "message": "mm: use mm_struct to resolve gate vma\u0027s in __get_user_pages\n\nWe now check if a requested user page overlaps a gate vma using the supplied mm\ninstead of the supplied task.  The given task is now used solely for accounting\npurposes and may be NULL.\n\nSigned-off-by: Stephen Wilson \u003cwilsons@start.ca\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "cae5d39032acf26c265f6b1dc73d7ce6ff4bc387",
      "tree": "9c89bcab3f4c17fb34eb44342d1f67bb4230d632",
      "parents": [
        "83b964bbf82eb13a8f31bb49ca420787fe01f7a6"
      ],
      "author": {
        "name": "Stephen Wilson",
        "email": "wilsons@start.ca",
        "time": "Sun Mar 13 15:49:17 2011 -0400"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Mar 23 16:36:55 2011 -0400"
      },
      "message": "mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm\n\nNow that gate vma\u0027s are referenced with respect to a particular mm and not a\nparticular task it only makes sense to propagate the change to this predicate as\nwell.\n\nSigned-off-by: Stephen Wilson \u003cwilsons@start.ca\u003e\nReviewed-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nCc: Ingo Molnar \u003cmingo@redhat.com\u003e\nCc: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "83b964bbf82eb13a8f31bb49ca420787fe01f7a6",
      "tree": "c94dcf5f4116ca351570fb9d2b7e37834e93f430",
      "parents": [
        "31db58b3ab432f72ea76be58b12e6ffaf627d5db"
      ],
      "author": {
        "name": "Stephen Wilson",
        "email": "wilsons@start.ca",
        "time": "Sun Mar 13 15:49:16 2011 -0400"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Mar 23 16:36:54 2011 -0400"
      },
      "message": "mm: arch: make in_gate_area take an mm_struct instead of a task_struct\n\nMorally, the question of whether an address lies in a gate vma should be asked\nwith respect to an mm, not a particular task.  Moreover, dropping the dependency\non task_struct will help make existing and future operations on mm\u0027s more\nflexible and convenient.\n\nSigned-off-by: Stephen Wilson \u003cwilsons@start.ca\u003e\nReviewed-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nCc: Ingo Molnar \u003cmingo@redhat.com\u003e\nCc: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "31db58b3ab432f72ea76be58b12e6ffaf627d5db",
      "tree": "c88b742e1f2c52045d5abc6d35d7492ebdf64541",
      "parents": [
        "375906f8765e131a4a159b1ffebf78c15db7b3bf"
      ],
      "author": {
        "name": "Stephen Wilson",
        "email": "wilsons@start.ca",
        "time": "Sun Mar 13 15:49:15 2011 -0400"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Mar 23 16:36:54 2011 -0400"
      },
      "message": "mm: arch: make get_gate_vma take an mm_struct instead of a task_struct\n\nMorally, the presence of a gate vma is more an attribute of a particular mm than\na particular task.  Moreover, dropping the dependency on task_struct will help\nmake both existing and future operations on mm\u0027s more flexible and convenient.\n\nSigned-off-by: Stephen Wilson \u003cwilsons@start.ca\u003e\nReviewed-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nCc: Ingo Molnar \u003cmingo@redhat.com\u003e\nCc: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "318b275fbca1ab9ec0862de71420e0e92c3d1aa7",
      "tree": "aa4984469443ed53b4e7fa23d3f91966e536a803",
      "parents": [
        "5fda1bd5b8869574dad8e1f9f71e23bf0c186274"
      ],
      "author": {
        "name": "Gleb Natapov",
        "email": "gleb@redhat.com",
        "time": "Tue Mar 22 16:30:51 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Mar 22 17:44:02 2011 -0700"
      },
      "message": "mm: allow GUP to fail instead of waiting on a page\n\nGUP user may want to try to acquire a reference to a page if it is already\nin memory, but not if IO, to bring it in, is needed.  For example KVM may\ntell vcpu to schedule another guest process if current one is trying to\naccess swapped out page.  Meanwhile, the page will be swapped in and the\nguest process, that depends on it, will be able to run again.\n\nThis patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and\nFOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in\nconjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that\nit shouldn\u0027t drop mmap_sem and wait on a page, but return VM_FAULT_RETRY\ninstead.\n\n[akpm@linux-foundation.org: improve FOLL_NOWAIT comment]\nSigned-off-by: Gleb Natapov \u003cgleb@redhat.com\u003e\nCc: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Avi Kivity \u003cavi@redhat.com\u003e\nCc: Marcelo Tosatti \u003cmtosatti@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e16b396ce314b2bcdfe6c173fe075bf8e3432368",
      "tree": "640f0f56f2ea676647af4eb42d32fa56be2ee549",
      "parents": [
        "7fd23a24717a327a66f3c32d11a20a2f169c824f",
        "e6e8dd5055a974935af1398c8648d4a9359b0ecb"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Mar 18 10:37:40 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Mar 18 10:37:40 2011 -0700"
      },
      "message": "Merge branch \u0027for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial\n\n* \u0027for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)\n  doc: CONFIG_UNEVICTABLE_LRU doesn\u0027t exist anymore\n  Update cpuset info \u0026 webiste for cgroups\n  dcdbas: force SMI to happen when expected\n  arch/arm/Kconfig: remove one to many l\u0027s in the word.\n  asm-generic/user.h: Fix spelling in comment\n  drm: fix printk typo \u0027sracth\u0027\n  Remove one to many n\u0027s in a word\n  Documentation/filesystems/romfs.txt: fixing link to genromfs\n  drivers:scsi Change printk typo initate -\u003e initiate\n  serial, pch uart: Remove duplicate inclusion of linux/pci.h header\n  fs/eventpoll.c: fix spelling\n  mm: Fix out-of-date comments which refers non-existent functions\n  drm: Fix printk typo \u0027failled\u0027\n  coh901318.c: Change initate to initiate.\n  mbox-db5500.c Change initate to initiate.\n  edac: correct i82975x error-info reported\n  edac: correct i82975x mci initialisation\n  edac: correct commented info\n  fs: update comments to point correct document\n  target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c\n  ...\n\nTrivial conflict in fs/eventpoll.c (spelling vs addition)\n"
    },
    {
      "commit": "69ebb83e13e514222b0ae4f8bd813a17679ed876",
      "tree": "62ccc7ee1e840d0a6cc01a9fc1c44a5f4e6f1edd",
      "parents": [
        "0014bd990e69063b0fb78940b35439d7980ce3ee"
      ],
      "author": {
        "name": "Huang Ying",
        "email": "ying.huang@intel.com",
        "time": "Sun Jan 30 11:15:48 2011 +0800"
      },
      "committer": {
        "name": "Marcelo Tosatti",
        "email": "mtosatti@redhat.com",
        "time": "Thu Mar 17 13:08:27 2011 -0300"
      },
      "message": "mm: make __get_user_pages return -EHWPOISON for HWPOISON page optionally\n\nMake __get_user_pages return -EHWPOISON for HWPOISON page only if\nFOLL_HWPOISON is specified.  With this patch, the interested callers\ncan distinguish HWPOISON pages from general FAULT pages, while other\ncallers will still get -EFAULT for all these pages, so the user space\ninterface need not to be changed.\n\nThis feature is needed by KVM, where UCR MCE should be relayed to\nguest for HWPOISON page, while instruction emulation and MMIO will be\ntried for general FAULT page.\n\nThe idea comes from Andrew Morton.\n\nSigned-off-by: Huang Ying \u003cying.huang@intel.com\u003e\nCc: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Marcelo Tosatti \u003cmtosatti@redhat.com\u003e\nSigned-off-by: Avi Kivity \u003cavi@redhat.com\u003e\n"
    },
    {
      "commit": "0014bd990e69063b0fb78940b35439d7980ce3ee",
      "tree": "56d4576cc07954eb304abaf602aba44a6aa2a4f1",
      "parents": [
        "91c9c3eda4f3066980d13a6907ef84f3a99364bd"
      ],
      "author": {
        "name": "Huang Ying",
        "email": "ying.huang@intel.com",
        "time": "Sun Jan 30 11:15:47 2011 +0800"
      },
      "committer": {
        "name": "Marcelo Tosatti",
        "email": "mtosatti@redhat.com",
        "time": "Thu Mar 17 13:08:27 2011 -0300"
      },
      "message": "mm: export __get_user_pages\n\nIn most cases, get_user_pages and get_user_pages_fast should be used\nto pin user pages in memory.  But sometimes, some special flags except\nFOLL_GET, FOLL_WRITE and FOLL_FORCE are needed, for example in\nfollowing patch, KVM needs FOLL_HWPOISON.  To support these users,\n__get_user_pages is exported directly.\n\nThere are some symbol name conflicts in infiniband driver, fixed them too.\n\nSigned-off-by: Huang Ying \u003cying.huang@intel.com\u003e\nCC: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nCC: Michel Lespinasse \u003cwalken@google.com\u003e\nCC: Roland Dreier \u003croland@kernel.org\u003e\nCC: Ralph Campbell \u003cinfinipath@qlogic.com\u003e\nSigned-off-by: Marcelo Tosatti \u003cmtosatti@redhat.com\u003e\n"
    },
    {
      "commit": "2aa15890f3c191326678f1bd68af61ec6b8753ec",
      "tree": "347f5fdcd0678b12be92f266cd2a5e7a74749403",
      "parents": [
        "78794b2cdeac37ac1fd950fc9c4454b56d88ac03"
      ],
      "author": {
        "name": "Miklos Szeredi",
        "email": "mszeredi@suse.cz",
        "time": "Wed Feb 23 13:49:47 2011 +0100"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Feb 23 19:52:52 2011 -0800"
      },
      "message": "mm: prevent concurrent unmap_mapping_range() on the same inode\n\nMichael Leun reported that running parallel opens on a fuse filesystem\ncan trigger a \"kernel BUG at mm/truncate.c:475\"\n\nGurudas Pai reported the same bug on NFS.\n\nThe reason is, unmap_mapping_range() is not prepared for more than\none concurrent invocation per inode.  For example:\n\n  thread1: going through a big range, stops in the middle of a vma and\n     stores the restart address in vm_truncate_count.\n\n  thread2: comes in with a small (e.g. single page) unmap request on\n     the same vma, somewhere before restart_address, finds that the\n     vma was already unmapped up to the restart address and happily\n     returns without doing anything.\n\nAnother scenario would be two big unmap requests, both having to\nrestart the unmapping and each one setting vm_truncate_count to its\nown value.  This could go on forever without any of them being able to\nfinish.\n\nTruncate and hole punching already serialize with i_mutex.  Other\ncallers of unmap_mapping_range() do not, and it\u0027s difficult to get\ni_mutex protection for all callers.  In particular -\u003ed_revalidate(),\nwhich calls invalidate_inode_pages2_range() in fuse, may be called\nwith or without i_mutex.\n\nThis patch adds a new mutex to \u0027struct address_space\u0027 to prevent\nrunning multiple concurrent unmap_mapping_range() on the same mapping.\n\n[ We\u0027ll hopefully get rid of all this with the upcoming mm\n  preemptibility series by Peter Zijlstra, the \"mm: Remove i_mmap_mutex\n  lockbreak\" patch in particular.  But that is for 2.6.39 ]\n\nSigned-off-by: Miklos Szeredi \u003cmszeredi@suse.cz\u003e\nReported-by: Michael Leun \u003clkml20101129@newton.leun.net\u003e\nReported-by: Gurudas Pai \u003cgurudas.pai@oracle.com\u003e\nTested-by: Gurudas Pai \u003cgurudas.pai@oracle.com\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: stable@kernel.org\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a335b2e17301afae9e794f21071a2fcdd5879c1e",
      "tree": "d6e3e3a5fad04c241d3c18ade63b8d239b30b6f9",
      "parents": [
        "ec4f2ac471e25d3e0cea05abb8da34c05a0868f9"
      ],
      "author": {
        "name": "Ryota Ozaki",
        "email": "ozaki.ryota@gmail.com",
        "time": "Thu Feb 10 13:56:28 2011 +0900"
      },
      "committer": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Thu Feb 17 16:54:39 2011 +0100"
      },
      "message": "mm: Fix out-of-date comments which refers non-existent functions\n\ndo_file_page and do_no_page don\u0027t exist anymore, but some comments\nstill refers them. The patch fixes them by replacing them with\nexisting ones.\n\nSigned-off-by: Ryota Ozaki \u003cozaki.ryota@gmail.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Jiri Kosina \u003cjkosina@suse.cz\u003e\n"
    },
    {
      "commit": "419d8c96dbfa558f00e623023917d0a5afc46129",
      "tree": "74882b1ed7340d3d0e448b343c52fd12969ea518",
      "parents": [
        "e15f8c01af924e611bc7be1e45449c4a74e5dfdd"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Thu Feb 10 15:01:33 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Feb 11 16:12:20 2011 -0800"
      },
      "message": "mlock: do not munlock pages in __do_fault()\n\nIf the page is going to be written to, __do_page needs to break COW.\n\nHowever, the old page (before breaking COW) was never mapped mapped into\nthe current pte (__do_fault is only called when the pte is not present),\nso vmscan can\u0027t have marked the old page as PageMlocked due to being\nmapped in __do_fault\u0027s VMA.  Therefore, __do_fault() does not need to\nworry about clearing PageMlocked() on the old page.\n\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e15f8c01af924e611bc7be1e45449c4a74e5dfdd",
      "tree": "7319b3d6834707996b16fd8d13ab745ad9b13a91",
      "parents": [
        "e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Thu Feb 10 15:01:32 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Feb 11 16:12:20 2011 -0800"
      },
      "message": "mlock: fix race when munlocking pages in do_wp_page()\n\nvmscan can lazily find pages that are mapped within VM_LOCKED vmas, and\nset the PageMlocked bit on these pages, transfering them onto the\nunevictable list.  When do_wp_page() breaks COW within a VM_LOCKED vma,\nit may need to clear PageMlocked on the old page and set it on the new\npage instead.\n\nThis change fixes an issue where do_wp_page() was clearing PageMlocked\non the old page while the pte was still pointing to it (as well as\nrmap).  Therefore, we were not protected against vmscan immediately\ntransfering the old page back onto the unevictable list.  This could\ncause pages to get stranded there forever.\n\nI propose to move the corresponding code to the end of do_wp_page(),\nafter the pte (and rmap) have been pointed to the new page.\nAdditionally, we can use munlock_vma_page() instead of\nclear_page_mlock(), so that the old page stays mlocked if there are\nstill other VM_LOCKED vmas mapping it.\n\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "14d1a55cd26f1860f837f37ae42520c7c13b1347",
      "tree": "b80634a6a2a5f306fd1c3fc408993dd9fc98202b",
      "parents": [
        "05b258e99725112c4febeab4fad23ea2c8908a3a"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:47:15 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:47 2011 -0800"
      },
      "message": "thp: add debug checks for mapcount related invariants\n\nAdd debug checks for invariants that if broken could lead to mapcount vs\npage_mapcount debug checks to trigger later in split_huge_page.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "500d65d471018d9a13b0d51b7e141ed2a3555c1d",
      "tree": "046dc2337f87a1a365fde126fab7f4ac9ae82793",
      "parents": [
        "0af4e98b6b095c74588af04872f83d333c958c32"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:46:55 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:42 2011 -0800"
      },
      "message": "thp: pmd_trans_huge migrate bugcheck\n\nNo pmd_trans_huge should ever materialize in migration ptes areas, because\nwe split the hugepage before migration ptes are instantiated.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f66055ab6fb9731dbfce320c5202ef4441b5d77f",
      "tree": "de347e42d1e5cf481344a153d272e86a95b774f4",
      "parents": [
        "05759d380a9d7f131a475186c07fce58ceaa8902"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:46:54 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:42 2011 -0800"
      },
      "message": "thp: verify pmd_trans_huge isn\u0027t leaking\n\npte_trans_huge must not leak in certain vmas like the mmio special pfn or\nfilebacked mappings.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "8a07651ee8cdaa9e27cb4ae372aed347533770f5",
      "tree": "07a442e66c3f608e174edd3b8a2fd154f4219380",
      "parents": [
        "71e3aac0724ffe8918992d76acfe3aad7d8724a5"
      ],
      "author": {
        "name": "Hugh Dickins",
        "email": "hughd@google.com",
        "time": "Thu Jan 13 15:46:52 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:42 2011 -0800"
      },
      "message": "thp: transparent hugepage core fixlet\n\nIf you configure THP in addition to HUGETLB_PAGE on x86_32 without PAE,\nthe p?d-folding works out that munlock_vma_pages_range() can crash to\nfollow_page()\u0027s pud_huge() BUG_ON(flags \u0026 FOLL_GET): it needs the same\nVM_HUGETLB check already there on the pmd_huge() line.  Conveniently,\nopenSUSE provides a \"blogd\" which tests this out at startup!\n\nSigned-off-by: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "71e3aac0724ffe8918992d76acfe3aad7d8724a5",
      "tree": "4ff96e1fc3e53bc9d25b859bf7e5bdbab8f1b25a",
      "parents": [
        "5c3240d92e29ae7bfb9cb58a9b37e80ab40894ff"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:46:52 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:42 2011 -0800"
      },
      "message": "thp: transparent hugepage core\n\nLately I\u0027ve been working to make KVM use hugepages transparently without\nthe usual restrictions of hugetlbfs.  Some of the restrictions I\u0027d like to\nsee removed:\n\n1) hugepages have to be swappable or the guest physical memory remains\n   locked in RAM and can\u0027t be paged out to swap\n\n2) if a hugepage allocation fails, regular pages should be allocated\n   instead and mixed in the same vma without any failure and without\n   userland noticing\n\n3) if some task quits and more hugepages become available in the\n   buddy, guest physical memory backed by regular pages should be\n   relocated on hugepages automatically in regions under\n   madvise(MADV_HUGEPAGE) (ideally event driven by waking up the\n   kernel deamon if the order\u003dHPAGE_PMD_SHIFT-PAGE_SHIFT list becomes\n   not null)\n\n4) avoidance of reservation and maximization of use of hugepages whenever\n   possible. Reservation (needed to avoid runtime fatal faliures) may be ok for\n   1 machine with 1 database with 1 database cache with 1 database cache size\n   known at boot time. It\u0027s definitely not feasible with a virtualization\n   hypervisor usage like RHEV-H that runs an unknown number of virtual machines\n   with an unknown size of each virtual machine with an unknown amount of\n   pagecache that could be potentially useful in the host for guest not using\n   O_DIRECT (aka cache\u003doff).\n\nhugepages in the virtualization hypervisor (and also in the guest!) are\nmuch more important than in a regular host not using virtualization,\nbecasue with NPT/EPT they decrease the tlb-miss cacheline accesses from 24\nto 19 in case only the hypervisor uses transparent hugepages, and they\ndecrease the tlb-miss cacheline accesses from 19 to 15 in case both the\nlinux hypervisor and the linux guest both uses this patch (though the\nguest will limit the addition speedup to anonymous regions only for\nnow...).  Even more important is that the tlb miss handler is much slower\non a NPT/EPT guest than for a regular shadow paging or no-virtualization\nscenario.  So maximizing the amount of virtual memory cached by the TLB\npays off significantly more with NPT/EPT than without (even if there would\nbe no significant speedup in the tlb-miss runtime).\n\nThe first (and more tedious) part of this work requires allowing the VM to\nhandle anonymous hugepages mixed with regular pages transparently on\nregular anonymous vmas.  This is what this patch tries to achieve in the\nleast intrusive possible way.  We want hugepages and hugetlb to be used in\na way so that all applications can benefit without changes (as usual we\nleverage the KVM virtualization design: by improving the Linux VM at\nlarge, KVM gets the performance boost too).\n\nThe most important design choice is: always fallback to 4k allocation if\nthe hugepage allocation fails!  This is the _very_ opposite of some large\npagecache patches that failed with -EIO back then if a 64k (or similar)\nallocation failed...\n\nSecond important decision (to reduce the impact of the feature on the\nexisting pagetable handling code) is that at any time we can split an\nhugepage into 512 regular pages and it has to be done with an operation\nthat can\u0027t fail.  This way the reliability of the swapping isn\u0027t decreased\n(no need to allocate memory when we are short on memory to swap) and it\u0027s\ntrivial to plug a split_huge_page* one-liner where needed without\npolluting the VM.  Over time we can teach mprotect, mremap and friends to\nhandle pmd_trans_huge natively without calling split_huge_page*.  The fact\nit can\u0027t fail isn\u0027t just for swap: if split_huge_page would return -ENOMEM\n(instead of the current void) we\u0027d need to rollback the mprotect from the\nmiddle of it (ideally including undoing the split_vma) which would be a\nbig change and in the very wrong direction (it\u0027d likely be simpler not to\ncall split_huge_page at all and to teach mprotect and friends to handle\nhugepages instead of rolling them back from the middle).  In short the\nvery value of split_huge_page is that it can\u0027t fail.\n\nThe collapsing and madvise(MADV_HUGEPAGE) part will remain separated and\nincremental and it\u0027ll just be an \"harmless\" addition later if this initial\npart is agreed upon.  It also should be noted that locking-wise replacing\nregular pages with hugepages is going to be very easy if compared to what\nI\u0027m doing below in split_huge_page, as it will only happen when\npage_count(page) matches page_mapcount(page) if we can take the PG_lock\nand mmap_sem in write mode.  collapse_huge_page will be a \"best effort\"\nthat (unlike split_huge_page) can fail at the minimal sign of trouble and\nwe can try again later.  collapse_huge_page will be similar to how KSM\nworks and the madvise(MADV_HUGEPAGE) will work similar to\nmadvise(MADV_MERGEABLE).\n\nThe default I like is that transparent hugepages are used at page fault\ntime.  This can be changed with\n/sys/kernel/mm/transparent_hugepage/enabled.  The control knob can be set\nto three values \"always\", \"madvise\", \"never\" which mean respectively that\nhugepages are always used, or only inside madvise(MADV_HUGEPAGE) regions,\nor never used.  /sys/kernel/mm/transparent_hugepage/defrag instead\ncontrols if the hugepage allocation should defrag memory aggressively\n\"always\", only inside \"madvise\" regions, or \"never\".\n\nThe pmd_trans_splitting/pmd_trans_huge locking is very solid.  The\nput_page (from get_user_page users that can\u0027t use mmu notifier like\nO_DIRECT) that runs against a __split_huge_page_refcount instead was a\npain to serialize in a way that would result always in a coherent page\ncount for both tail and head.  I think my locking solution with a\ncompound_lock taken only after the page_first is valid and is still a\nPageHead should be safe but it surely needs review from SMP race point of\nview.  In short there is no current existing way to serialize the O_DIRECT\nfinal put_page against split_huge_page_refcount so I had to invent a new\none (O_DIRECT loses knowledge on the mapping status by the time gup_fast\nreturns so...).  And I didn\u0027t want to impact all gup/gup_fast users for\nnow, maybe if we change the gup interface substantially we can avoid this\nlocking, I admit I didn\u0027t think too much about it because changing the gup\nunpinning interface would be invasive.\n\nIf we ignored O_DIRECT we could stick to the existing compound refcounting\ncode, by simply adding a get_user_pages_fast_flags(foll_flags) where KVM\n(and any other mmu notifier user) would call it without FOLL_GET (and if\nFOLL_GET isn\u0027t set we\u0027d just BUG_ON if nobody registered itself in the\ncurrent task mmu notifier list yet).  But O_DIRECT is fundamental for\ndecent performance of virtualized I/O on fast storage so we can\u0027t avoid it\nto solve the race of put_page against split_huge_page_refcount to achieve\na complete hugepage feature for KVM.\n\nSwap and oom works fine (well just like with regular pages ;).  MMU\nnotifier is handled transparently too, with the exception of the young bit\non the pmd, that didn\u0027t have a range check but I think KVM will be fine\nbecause the whole point of hugepages is that EPT/NPT will also use a huge\npmd when they notice gup returns pages with PageCompound set, so they\nwon\u0027t care of a range and there\u0027s just the pmd young bit to check in that\ncase.\n\nNOTE: in some cases if the L2 cache is small, this may slowdown and waste\nmemory during COWs because 4M of memory are accessed in a single fault\ninstead of 8k (the payoff is that after COW the program can run faster).\nSo we might want to switch the copy_huge_page (and clear_huge_page too) to\nnot temporal stores.  I also extensively researched ways to avoid this\ncache trashing with a full prefault logic that would cow in 8k/16k/32k/64k\nup to 1M (I can send those patches that fully implemented prefault) but I\nconcluded they\u0027re not worth it and they add an huge additional complexity\nand they remove all tlb benefits until the full hugepage has been faulted\nin, to save a little bit of memory and some cache during app startup, but\nthey still don\u0027t improve substantially the cache-trashing during startup\nif the prefault happens in \u003e4k chunks.  One reason is that those 4k pte\nentries copied are still mapped on a perfectly cache-colored hugepage, so\nthe trashing is the worst one can generate in those copies (cow of 4k page\ncopies aren\u0027t so well colored so they trashes less, but again this results\nin software running faster after the page fault).  Those prefault patches\nallowed things like a pte where post-cow pages were local 4k regular anon\npages and the not-yet-cowed pte entries were pointing in the middle of\nsome hugepage mapped read-only.  If it doesn\u0027t payoff substantially with\ntodays hardware it will payoff even less in the future with larger l2\ncaches, and the prefault logic would blot the VM a lot.  If one is\nemebdded transparent_hugepage can be disabled during boot with sysfs or\nwith the boot commandline parameter transparent_hugepage\u003d0 (or\ntransparent_hugepage\u003d2 to restrict hugepages inside madvise regions) that\nwill ensure not a single hugepage is allocated at boot time.  It is simple\nenough to just disable transparent hugepage globally and let transparent\nhugepages be allocated selectively by applications in the MADV_HUGEPAGE\nregion (both at page fault time, and if enabled with the\ncollapse_huge_page too through the kernel daemon).\n\nThis patch supports only hugepages mapped in the pmd, archs that have\nsmaller hugepages will not fit in this patch alone.  Also some archs like\npower have certain tlb limits that prevents mixing different page size in\nthe same regions so they will not fit in this framework that requires\n\"graceful fallback\" to basic PAGE_SIZE in case of physical memory\nfragmentation.  hugetlbfs remains a perfect fit for those because its\nsoftware limits happen to match the hardware limits.  hugetlbfs also\nremains a perfect fit for hugepage sizes like 1GByte that cannot be hoped\nto be found not fragmented after a certain system uptime and that would be\nvery expensive to defragment with relocation, so requiring reservation.\nhugetlbfs is the \"reservation way\", the point of transparent hugepages is\nnot to have any reservation at all and maximizing the use of cache and\nhugepages at all times automatically.\n\nSome performance result:\n\nvmx andrea # LD_PRELOAD\u003d/usr/lib64/libhugetlbfs.so HUGETLB_MORECORE\u003dyes HUGETLB_PATH\u003d/mnt/huge/ ./largep\nages3\nmemset page fault 1566023\nmemset tlb miss 453854\nmemset second tlb miss 453321\nrandom access tlb miss 41635\nrandom access second tlb miss 41658\nvmx andrea # LD_PRELOAD\u003d/usr/lib64/libhugetlbfs.so HUGETLB_MORECORE\u003dyes HUGETLB_PATH\u003d/mnt/huge/ ./largepages3\nmemset page fault 1566471\nmemset tlb miss 453375\nmemset second tlb miss 453320\nrandom access tlb miss 41636\nrandom access second tlb miss 41637\nvmx andrea # ./largepages3\nmemset page fault 1566642\nmemset tlb miss 453417\nmemset second tlb miss 453313\nrandom access tlb miss 41630\nrandom access second tlb miss 41647\nvmx andrea # ./largepages3\nmemset page fault 1566872\nmemset tlb miss 453418\nmemset second tlb miss 453315\nrandom access tlb miss 41618\nrandom access second tlb miss 41659\nvmx andrea # echo 0 \u003e /proc/sys/vm/transparent_hugepage\nvmx andrea # ./largepages3\nmemset page fault 2182476\nmemset tlb miss 460305\nmemset second tlb miss 460179\nrandom access tlb miss 44483\nrandom access second tlb miss 44186\nvmx andrea # ./largepages3\nmemset page fault 2182791\nmemset tlb miss 460742\nmemset second tlb miss 459962\nrandom access tlb miss 43981\nrandom access second tlb miss 43988\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n#include \u003cstdio.h\u003e\n#include \u003cstdlib.h\u003e\n#include \u003cstring.h\u003e\n#include \u003csys/time.h\u003e\n\n#define SIZE (3UL*1024*1024*1024)\n\nint main()\n{\n\tchar *p \u003d malloc(SIZE), *p2;\n\tstruct timeval before, after;\n\n\tgettimeofday(\u0026before, NULL);\n\tmemset(p, 0, SIZE);\n\tgettimeofday(\u0026after, NULL);\n\tprintf(\"memset page fault %Lu\\n\",\n\t       (after.tv_sec-before.tv_sec)*1000000UL +\n\t       after.tv_usec-before.tv_usec);\n\n\tgettimeofday(\u0026before, NULL);\n\tmemset(p, 0, SIZE);\n\tgettimeofday(\u0026after, NULL);\n\tprintf(\"memset tlb miss %Lu\\n\",\n\t       (after.tv_sec-before.tv_sec)*1000000UL +\n\t       after.tv_usec-before.tv_usec);\n\n\tgettimeofday(\u0026before, NULL);\n\tmemset(p, 0, SIZE);\n\tgettimeofday(\u0026after, NULL);\n\tprintf(\"memset second tlb miss %Lu\\n\",\n\t       (after.tv_sec-before.tv_sec)*1000000UL +\n\t       after.tv_usec-before.tv_usec);\n\n\tgettimeofday(\u0026before, NULL);\n\tfor (p2 \u003d p; p2 \u003c p+SIZE; p2 +\u003d 4096)\n\t\t*p2 \u003d 0;\n\tgettimeofday(\u0026after, NULL);\n\tprintf(\"random access tlb miss %Lu\\n\",\n\t       (after.tv_sec-before.tv_sec)*1000000UL +\n\t       after.tv_usec-before.tv_usec);\n\n\tgettimeofday(\u0026before, NULL);\n\tfor (p2 \u003d p; p2 \u003c p+SIZE; p2 +\u003d 4096)\n\t\t*p2 \u003d 0;\n\tgettimeofday(\u0026after, NULL);\n\tprintf(\"random access second tlb miss %Lu\\n\",\n\t       (after.tv_sec-before.tv_sec)*1000000UL +\n\t       after.tv_usec-before.tv_usec);\n\n\treturn 0;\n}\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "47ad8475c000141eacb3ecda5e5ce4b43a9cd04d",
      "tree": "78c29aaf2ae9340e314a25ea08e9724471cf4414",
      "parents": [
        "3f04f62f90d46a82dd73027c5fd7a15daed5c33d"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:46:47 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:41 2011 -0800"
      },
      "message": "thp: clear_copy_huge_page\n\nMove the copy/clear_huge_page functions to common code to share between\nhugetlb.c and huge_memory.c.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "8ac1f8320a0073f28cf9e0491af4cd98f504f92a",
      "tree": "4dad891c302587fdc7b099b18e05d7dbc5526c64",
      "parents": [
        "64cc6ae001d70bc59e5f854e6b5678f59110df16"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:46:43 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:40 2011 -0800"
      },
      "message": "thp: pte alloc trans splitting\n\npte alloc routines must wait for split_huge_page if the pmd is not present\nand not null (i.e.  pmd_trans_splitting).  The additional branches are\noptimized away at compile time by pmd_trans_splitting if the config option\nis off.  However we must pass the vma down in order to know the anon_vma\nlock to wait for.\n\n[akpm@linux-foundation.org: coding-style fixes]\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "14fd403f2146f740942d78af4e0ee59396ad8eab",
      "tree": "c87734f6c6639684208d36548aa3687c6f460e23",
      "parents": [
        "2609ae6d10af0531e826335bd1445d1ace17c847"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:46:37 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:39 2011 -0800"
      },
      "message": "thp: export maybe_mkwrite\n\nhuge_memory.c needs it too when it fallbacks in copying hugepages into\nregular fragmented pages if hugepage allocation fails during COW.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "53a7706d5ed8f1a53ba062b318773160cc476dde",
      "tree": "a1990d90d5af3686b7a83b2bbc2ae6463971efc5",
      "parents": [
        "5fdb2002131cd4e210b9638a4fc932ec7be491d1"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Thu Jan 13 15:46:14 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:36 2011 -0800"
      },
      "message": "mlock: do not hold mmap_sem for extended periods of time\n\n__get_user_pages gets a new \u0027nonblocking\u0027 parameter to signal that the\ncaller is prepared to re-acquire mmap_sem and retry the operation if\nneeded.  This is used to split off long operations if they are going to\nblock on a disk transfer, or when we detect contention on the mmap_sem.\n\n[akpm@linux-foundation.org: remove ref to rwsem_is_contended()]\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Peter Zijlstra \u003cpeterz@infradead.org\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Ingo Molnar \u003cmingo@elte.hu\u003e\nCc: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nCc: David Howells \u003cdhowells@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "110d74a921f4d272b47ef6104fcf937df808f4c8",
      "tree": "a2f1705e049f06e1cf8cbaf7d6b3261f0b46b6ab",
      "parents": [
        "fed067da46ad3b9acedaf794a5f05d0bc153280b"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Thu Jan 13 15:46:11 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:36 2011 -0800"
      },
      "message": "mm: add FOLL_MLOCK follow_page flag.\n\nMove the code to mlock pages from __mlock_vma_pages_range() to\nfollow_page().\n\nThis allows __mlock_vma_pages_range() to not have to break down work into\n16-page batches.\n\nAn additional motivation for doing this within the present patch series is\nthat it\u0027ll make it easier for a later chagne to drop mmap_sem when\nblocking on disk (we\u0027d like to be able to resume at the page that was read\nfrom disk instead of at the start of a 16-page batch).\n\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Peter Zijlstra \u003cpeterz@infradead.org\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Ingo Molnar \u003cmingo@elte.hu\u003e\nCc: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nCc: David Howells \u003cdhowells@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272",
      "tree": "e6c3e7dac64a5e45b48ab7836318752202579a17",
      "parents": [
        "72ddc8f72270758951ccefb7d190f364d20215ab"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Thu Jan 13 15:46:09 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:35 2011 -0800"
      },
      "message": "mlock: avoid dirtying pages and triggering writeback\n\nWhen faulting in pages for mlock(), we want to break COW for anonymous or\nfile pages within VM_WRITABLE, non-VM_SHARED vmas.  However, there is no\nneed to write-fault into VM_SHARED vmas since shared file pages can be\nmlocked first and dirtied later, when/if they actually get written to.\nSkipping the write fault is desirable, as we don\u0027t want to unnecessarily\ncause these pages to be dirtied and queued for writeback.\n\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Kosaki Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Peter Zijlstra \u003cpeterz@infradead.org\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Theodore Tso \u003ctytso@google.com\u003e\nCc: Michael Rubin \u003cmrubin@google.com\u003e\nCc: Suleiman Souhlal \u003csuleiman@google.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "72ddc8f72270758951ccefb7d190f364d20215ab",
      "tree": "11772272825f72aa3f32c0f9be5cf35155cf1441",
      "parents": [
        "b009c024ff0059e293c1937516f2defe56263650"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Thu Jan 13 15:46:08 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:35 2011 -0800"
      },
      "message": "do_wp_page: clarify dirty_page handling\n\nReorganize the code so that dirty pages are handled closer to the place\nthat makes them dirty (handling write fault into shared, writable VMAs).\nNo behavior changes.\n\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Kosaki Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Peter Zijlstra \u003cpeterz@infradead.org\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Theodore Tso \u003ctytso@google.com\u003e\nCc: Michael Rubin \u003cmrubin@google.com\u003e\nCc: Suleiman Souhlal \u003csuleiman@google.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b009c024ff0059e293c1937516f2defe56263650",
      "tree": "35d71c837b954e884c429c9c36a85aaf7b033c49",
      "parents": [
        "212260aa07135b327752dc02625c68cf4ce04caf"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Thu Jan 13 15:46:07 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:35 2011 -0800"
      },
      "message": "do_wp_page: remove the \u0027reuse\u0027 flag\n\nmlocking a shared, writable vma currently causes the corresponding pages\nto be marked as dirty and queued for writeback.  This seems rather\nunnecessary given that the pages are not being actually modified during\nmlock.  It is understood that for non-shared mappings (file or anon) we\nwant to use a write fault in order to break COW, but there is just no such\nneed for shared mappings.\n\nThe first two patches in this series do not introduce any behavior change.\n The intent there is to make it obvious that dirtying file pages is only\ndone in the (writable, shared) case.  I think this clarifies the code, but\nI wouldn\u0027t mind dropping these two patches if there is no consensus about\nthem.\n\nThe last patch is where we actually avoid dirtying shared mappings during\nmlock.  Note that as a side effect of this, we won\u0027t call page_mkwrite()\nfor the mappings that define it, and won\u0027t be pre-allocating data blocks\nat the FS level if the mapped file was sparsely allocated.  My\nunderstanding is that mlock does not need to provide such guarantee, as\nevidenced by the fact that it never did for the filesystems that don\u0027t\ndefine page_mkwrite() - including some common ones like ext3.  However, I\nwould like to gather feedback on this from filesystem people as a\nprecaution.  If this turns out to be a showstopper, maybe block\npreallocation can be added back on using a different interface.\n\nLarge shared mlocks are getting significantly (\u003e2x) faster in my tests, as\nthe disk can be fully used for reading the file instead of having to share\nbetween this and writeback.\n\nThis patch:\n\nReorganize the code to remove the \u0027reuse\u0027 flag.  No behavior changes.\n\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Kosaki Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Peter Zijlstra \u003cpeterz@infradead.org\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nCc: Theodore Tso \u003ctytso@google.com\u003e\nCc: Michael Rubin \u003cmrubin@google.com\u003e\nCc: Suleiman Souhlal \u003csuleiman@google.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3ecb01df3261d3b1f02ccfcf8384e2a255d2a1d0",
      "tree": "1fe91114d8829a511db48d757c787cfede3b929c",
      "parents": [
        "b6472776816af1ed52848c93d26e3edb3b17adab"
      ],
      "author": {
        "name": "Jan Beulich",
        "email": "JBeulich@novell.com",
        "time": "Tue Oct 26 14:22:27 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:13 2010 -0700"
      },
      "message": "use clear_page()/copy_page() in favor of memset()/memcpy() on whole pages\n\nAfter all that\u0027s what they are intended for.\n\nSigned-off-by: Jan Beulich \u003cjbeulich@novell.com\u003e\nCc: Miklos Szeredi \u003cmiklos@szeredi.hu\u003e\nCc: \"Eric W. Biederman\" \u003cebiederm@xmission.com\u003e\nCc: \"Rafael J. Wysocki\" \u003crjw@sisk.pl\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1b36ba815bd91f17e31277a44dd5c6b6a5a8d97e",
      "tree": "9d68d66e780c619b01c5d8ddc93e19547b448142",
      "parents": [
        "e6219ec8195efd5640765e657810f262ad9d1a92"
      ],
      "author": {
        "name": "Namhyung Kim",
        "email": "namhyung@gmail.com",
        "time": "Tue Oct 26 14:22:00 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:09 2010 -0700"
      },
      "message": "mm: wrap follow_pte() using __cond_lock()\n\nThe follow_pte() conditionally grabs *@ptlp in case of returning 0.\nRename and wrap it using __cond_lock() removes following warnings:\n\n mm/memory.c:2337:9: warning: context imbalance in \u0027do_wp_page\u0027 - unexpected unlock\n mm/memory.c:3142:19: warning: context imbalance in \u0027handle_mm_fault\u0027 - different lock contexts for basic block\n\nSigned-off-by: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e6219ec8195efd5640765e657810f262ad9d1a92",
      "tree": "36c718adce5018fe87398fc7d8ebb7c1dfb14646",
      "parents": [
        "25ca1d6c02fe1c6d90d918867ef670d323725458"
      ],
      "author": {
        "name": "Namhyung Kim",
        "email": "namhyung@gmail.com",
        "time": "Tue Oct 26 14:22:00 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:09 2010 -0700"
      },
      "message": "mm: add lock release annotation on do_wp_page()\n\nThe do_wp_page() releases @ptl but was missing proper annotation.  Add it.\n This removes following warnings from sparse:\n\n mm/memory.c:2337:9: warning: context imbalance in \u0027do_wp_page\u0027 - unexpected unlock\n mm/memory.c:3142:19: warning: context imbalance in \u0027handle_mm_fault\u0027 - different lock contexts for basic block\n\nSigned-off-by: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "25ca1d6c02fe1c6d90d918867ef670d323725458",
      "tree": "de1709dd1dc7e0b9e9bd91840beb02f12e56b7e0",
      "parents": [
        "e6223a3b19421e3a8df1352d21fd0d71093f44ae"
      ],
      "author": {
        "name": "Namhyung Kim",
        "email": "namhyung@gmail.com",
        "time": "Tue Oct 26 14:21:59 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:09 2010 -0700"
      },
      "message": "mm: wrap get_locked_pte() using __cond_lock()\n\nThe get_locked_pte() conditionally grabs \u0027ptl\u0027 in case of returning\nnon-NULL.  This leads sparse to complain about context imbalance.  Rename\nand wrap it using __cond_lock() to make sparse happy.\n\nSigned-off-by: Namhyung Kim \u003cnamhyung@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d065bd810b6deb67d4897a14bfe21f8eb526ba99",
      "tree": "f58c59075732ec4ccba336278c9bdc7ff61bef94",
      "parents": [
        "b522c94da5d9cbc73f708be5e530ebc3bbd4a031"
      ],
      "author": {
        "name": "Michel Lespinasse",
        "email": "walken@google.com",
        "time": "Tue Oct 26 14:21:57 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:09 2010 -0700"
      },
      "message": "mm: retry page fault when blocking on disk transfer\n\nThis change reduces mmap_sem hold times that are caused by waiting for\ndisk transfers when accessing file mapped VMAs.\n\nIt introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call\nsite wants mmap_sem to be released if blocking on a pending disk transfer.\nIn that case, filemap_fault() returns the VM_FAULT_RETRY status bit and\ndo_page_fault() will then re-acquire mmap_sem and retry the page fault.\n\nIt is expected that the retry will hit the same page which will now be\ncached, and thus it will complete with a low mmap_sem hold time.\n\nTests:\n\n- microbenchmark: thread A mmaps a large file and does random read accesses\n  to the mmaped area - achieves about 55 iterations/s. Thread B does\n  mmap/munmap in a loop at a separate location - achieves 55 iterations/s\n  before, 15000 iterations/s after.\n\n- We are seeing related effects in some applications in house, which show\n  significant performance regressions when running without this change.\n\n[akpm@linux-foundation.org: fix warning \u0026 crash]\nSigned-off-by: Michel Lespinasse \u003cwalken@google.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nReviewed-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Ingo Molnar \u003cmingo@elte.hu\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nAcked-by: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nCc: \u003clinux-arch@vger.kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ece0e2b6406a995c371e0311190631ea34ad851a",
      "tree": "726a516a91f5f7efe9dbb247ba28d019981d456e",
      "parents": [
        "3e4d3af501cccdc8a8cca41bdbe57d54ad7e7e73"
      ],
      "author": {
        "name": "Peter Zijlstra",
        "email": "a.p.zijlstra@chello.nl",
        "time": "Tue Oct 26 14:21:52 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:08 2010 -0700"
      },
      "message": "mm: remove pte_*map_nested()\n\nSince we no longer need to provide KM_type, the whole pte_*map_nested()\nAPI is now redundant, remove it.\n\nSigned-off-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nAcked-by: Chris Metcalf \u003ccmetcalf@tilera.com\u003e\nCc: David Howells \u003cdhowells@redhat.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Ingo Molnar \u003cmingo@elte.hu\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nCc: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nCc: Steven Rostedt \u003crostedt@goodmis.org\u003e\nCc: Russell King \u003crmk@arm.linux.org.uk\u003e\nCc: Ralf Baechle \u003cralf@linux-mips.org\u003e\nCc: David Miller \u003cdavem@davemloft.net\u003e\nCc: Paul Mackerras \u003cpaulus@samba.org\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "46e387bbd82d438b9131e237e6e2cb55a825da49",
      "tree": "414948afd6b4d63c6ea8cc79ce022128bc1bf2eb",
      "parents": [
        "e9d08567ef72a2d0fb9b14dded386352d3136442",
        "3ef8fd7f720fc4f462fcdcae2fcde6f1c0536bfe"
      ],
      "author": {
        "name": "Andi Kleen",
        "email": "ak@linux.intel.com",
        "time": "Fri Oct 22 17:40:48 2010 +0200"
      },
      "committer": {
        "name": "Andi Kleen",
        "email": "ak@linux.intel.com",
        "time": "Fri Oct 22 17:40:48 2010 +0200"
      },
      "message": "Merge branch \u0027hwpoison-hugepages\u0027 into hwpoison\n\nConflicts:\n\tmm/memory-failure.c\n"
    },
    {
      "commit": "c3b86a29429dac1033e3f602f51fa8d00006a8eb",
      "tree": "bcedd0a553ca2396eeb58318ef6ee6b426e83652",
      "parents": [
        "8d8d2e9ccd331a1345c88b292ebee9d256fd8749",
        "2aeb66d3036dbafc297ac553a257a40283dadb3e"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Oct 21 13:47:29 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Oct 21 13:47:29 2010 -0700"
      },
      "message": "Merge branch \u0027x86-mm-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip\n\n* \u0027x86-mm-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:\n  x86-32, percpu: Correct the ordering of the percpu readmostly section\n  x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G\n  x86: Spread tlb flush vector between nodes\n  percpu: Introduce a read-mostly percpu API\n  x86, mm: Fix incorrect data type in vmalloc_sync_all()\n  x86, mm: Hold mm-\u003epage_table_lock while doing vmalloc_sync\n  x86, mm: Fix bogus whitespace in sync_global_pgds()\n  x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation\n  x86, mm: Add RESERVE_BRK_ARRAY() helper\n  mm, x86: Saving vmcore with non-lazy freeing of vmas\n  x86, kdump: Change copy_oldmem_page() to use cached addressing\n  x86, mm: fix uninitialized addr in kernel_physical_mapping_init()\n  x86, kmemcheck: Remove double test\n  x86, mm: Make spurious_fault check explicitly check the PRESENT bit\n  x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes\n  x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions\n  x86, mm: Avoid unnecessary TLB flush\n"
    },
    {
      "commit": "aa50d3a7aa8147b9e14dc9d5972a5d2359db4ef8",
      "tree": "68fae5060333dcc24c17e9dd00a87bd760d883e9",
      "parents": [
        "6f39ce056ab2ab2d29b2fae4aed61ed0b485972f"
      ],
      "author": {
        "name": "Andi Kleen",
        "email": "ak@linux.intel.com",
        "time": "Wed Oct 06 21:45:00 2010 +0200"
      },
      "committer": {
        "name": "Andi Kleen",
        "email": "ak@linux.intel.com",
        "time": "Fri Oct 08 09:32:46 2010 +0200"
      },
      "message": "Encode huge page size for VM_FAULT_HWPOISON errors\n\nThis fixes a problem introduced with the hugetlb hwpoison handling\n\nThe user space SIGBUS signalling wants to know the size of the hugepage\nthat caused a HWPOISON fault.\n\nUnfortunately the architecture page fault handlers do not have easy\naccess to the struct page.\n\nPass the information out in the fault error code instead.\n\nI added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode\nthe hpage index in some free upper bits of the fault code. The small\npage hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize\nchanges.\n\nAlso add code to hugetlb.h to convert that index into a page shift.\n\nWill be used in a further patch.\n\nCc: Naoya Horiguchi \u003cn-horiguchi@ah.jp.nec.com\u003e\nCc: fengguang.wu@intel.com\nSigned-off-by: Andi Kleen \u003cak@linux.intel.com\u003e\n"
    },
    {
      "commit": "31c4a3d3a0f84a5847665f8aa0552d188389f791",
      "tree": "6dbc630213c899c82030e38c9fa1125c060ef2fe",
      "parents": [
        "2422084a94fcd5038406261b331672a13c92c050"
      ],
      "author": {
        "name": "Hugh Dickins",
        "email": "hughd@google.com",
        "time": "Sun Sep 19 19:40:22 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Sep 20 10:44:37 2010 -0700"
      },
      "message": "mm: further fix swapin race condition\n\nCommit 4969c1192d15 (\"mm: fix swapin race condition\") is now agreed to\nbe incomplete.  There\u0027s a race, not very much less likely than the\noriginal race envisaged, in which it is further necessary to check that\nthe swapcache page\u0027s swap has not changed.\n\nHere\u0027s the reasoning: cast in terms of reuse_swap_page(), but probably\ncould be reformulated to rely on try_to_free_swap() instead, or on\nswapoff+swapon.\n\nA, faults into do_swap_page(): does page1 \u003d lookup_swap_cache(swap1) and\ncomes through the lock_page(page1).\n\nB, a racing thread of the same process, faults on the same address: does\npage1 \u003d lookup_swap_cache(swap1) and now waits in lock_page(page1), but\nfor whatever reason is unlucky not to get the lock any time soon.\n\nA carries on through do_swap_page(), a write fault, but cannot reuse the\nswap page1 (another reference to swap1).  Unlocks the page1 (but B\ndoesn\u0027t get it yet), does COW in do_wp_page(), page2 now in that pte.\n\nC, perhaps the parent of A+B, comes in and write faults the same swap\npage1 into its mm, reuse_swap_page() succeeds this time, swap1 is freed.\n\nkswapd comes in after some time (B still unlucky) and swaps out some\npages from A+B and C: it allocates the original swap1 to page2 in A+B,\nand some other swap2 to the original page1 now in C.  But does not\nimmediately free page1 (actually it couldn\u0027t: B holds a reference),\nleaving it in swap cache for now.\n\nB at last gets the lock on page1, hooray! Is PageSwapCache(page1)? Yes.\nIs pte_same(*page_table, orig_pte)? Yes, because page2 has now been\ngiven the swap1 which page1 used to have.  So B proceeds to insert page1\ninto A+B\u0027s page_table, though its content now belongs to C, quite\ndifferent from what A wrote there.\n\nB ought to have checked that page1\u0027s swap was still swap1.\n\nSigned-off-by: Hugh Dickins \u003chughd@google.com\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: stable@kernel.org\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4969c1192d15afa3389e7ae3302096ff684ba655",
      "tree": "abe560c8f293191be65488c49f4db3f3a626e63c",
      "parents": [
        "7c5367f205f7d53659fb19b9fdf65b7bc1a592c6"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Sep 09 16:37:52 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Sep 09 18:57:24 2010 -0700"
      },
      "message": "mm: fix swapin race condition\n\nThe pte_same check is reliable only if the swap entry remains pinned (by\nthe page lock on swapcache).  We\u0027ve also to ensure the swapcache isn\u0027t\nremoved before we take the lock as try_to_free_swap won\u0027t care about the\npage pin.\n\nOne of the possible impacts of this patch is that a KSM-shared page can\npoint to the anon_vma of another process, which could exit before the page\nis freed.\n\nThis can leave a page with a pointer to a recycled anon_vma object, or\nworse, a pointer to something that is no longer an anon_vma.\n\n[riel@redhat.com: changelog help]\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "8ca3eb08097f6839b2206e2242db4179aee3cfb3",
      "tree": "32b9f033230d615d248fa0bbfa1a0c644a422ed8",
      "parents": [
        "9559fcdbff4f93d29af04478bbc48294519424f5"
      ],
      "author": {
        "name": "Luck, Tony",
        "email": "tony.luck@intel.com",
        "time": "Tue Aug 24 11:44:18 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Aug 24 12:13:20 2010 -0700"
      },
      "message": "guard page for stacks that grow upwards\n\npa-risc and ia64 have stacks that grow upwards. Check that\nthey do not run into other mappings. By making VM_GROWSUP\n0x0 on architectures that do not ever use it, we can avoid\nsome unpleasant #ifdefs in check_stack_guard_page().\n\nSigned-off-by: Tony Luck \u003ctony.luck@intel.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "61c77326d1df079f202fa79403c3ccd8c5966a81",
      "tree": "57780e6b94f24f402d1c9036d6e7cf37a359c22f",
      "parents": [
        "76be97c1fc945db08aae1f1b746012662d643e97"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Mon Aug 16 09:16:55 2010 +0800"
      },
      "committer": {
        "name": "H. Peter Anvin",
        "email": "hpa@zytor.com",
        "time": "Mon Aug 23 10:04:57 2010 -0700"
      },
      "message": "x86, mm: Avoid unnecessary TLB flush\n\nIn x86, access and dirty bits are set automatically by CPU when CPU accesses\nmemory. When we go into the code path of below flush_tlb_fix_spurious_fault(),\nwe already set dirty bit for pte and don\u0027t need flush tlb. This might mean\ntlb entry in some CPUs hasn\u0027t dirty bit set, but this doesn\u0027t matter. When\nthe CPUs do page write, they will automatically check the bit and no software\ninvolved.\n\nOn the other hand, flush tlb in below position is harmful. Test creates CPU\nnumber of threads, each thread writes to a same but random address in same vma\nrange and we measure the total time. Under a 4 socket system, original time is\n1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is\n20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for\ntlb flush.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nLKML-Reference: \u003c20100816011655.GA362@sli10-desk.sh.intel.com\u003e\nAcked-by: Suresh Siddha \u003csuresh.b.siddha@intel.com\u003e\nCc: Andrea Archangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: H. Peter Anvin \u003chpa@zytor.com\u003e\n"
    },
    {
      "commit": "0e8e50e20c837eeec8323bba7dcd25fe5479194c",
      "tree": "12c7ec767a4a8508be33442c6fb55c28a26c94cd",
      "parents": [
        "7798330ac8114c731cfab83e634c6ecedaa233d7"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Aug 20 16:49:40 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sat Aug 21 08:50:00 2010 -0700"
      },
      "message": "mm: make stack guard page logic use vm_prev pointer\n\nLike the mlock() change previously, this makes the stack guard check\ncode use vma-\u003evm_prev to see what the mapping below the current stack\nis, rather than have to look it up with find_vma().\n\nAlso, accept an abutting stack segment, since that happens naturally if\nyou split the stack with mlock or mprotect.\n\nTested-by: Ian Campbell \u003cijc@hellion.org.uk\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "11ac552477e32835cb6970bf0a70c210807f5673",
      "tree": "959521ee3e217da81b08209df0f0db760e1efdb8",
      "parents": [
        "92fa5bd9a946b6e7aab6764e7312e4e3d9bed295"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sat Aug 14 11:44:56 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sat Aug 14 11:44:56 2010 -0700"
      },
      "message": "mm: fix page table unmap for stack guard page properly\n\nWe do in fact need to unmap the page table _before_ doing the whole\nstack guard page logic, because if it is needed (mainly 32-bit x86 with\nPAE and CONFIG_HIGHPTE, but other architectures may use it too) then it\nwill do a kmap_atomic/kunmap_atomic.\n\nAnd those kmaps will create an atomic region that we cannot do\nallocations in.  However, the whole stack expand code will need to do\nanon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an\natomic region.\n\nNow, a better model might actually be to do the anon_vma_prepare() when\n_creating_ a VM_GROWSDOWN segment, and not have to worry about any of\nthis at page fault time.  But in the meantime, this is the\nstraightforward fix for the issue.\n\nSee https://bugzilla.kernel.org/show_bug.cgi?id\u003d16588 for details.\n\nReported-by: Wylda \u003cwylda@volny.cz\u003e\nReported-by: Sedat Dilek \u003csedat.dilek@gmail.com\u003e\nReported-by: Mike Pagano \u003cmpagano@gentoo.org\u003e\nReported-by: François Valenduc \u003cfrancois.valenduc@tvcablenet.be\u003e\nTested-by: Ed Tomlinson \u003cedt@aei.ca\u003e\nCc: Pekka Enberg \u003cpenberg@kernel.org\u003e\nCc: Greg KH \u003cgregkh@suse.de\u003e\nCc: stable@kernel.org\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5528f9132cf65d4d892bcbc5684c61e7822b21e9",
      "tree": "46ad9b7a106a42579b869b42bf237a663370a613",
      "parents": [
        "320b2b8de12698082609ebbc1a17165727f4c893"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Aug 13 09:24:04 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Aug 13 09:24:04 2010 -0700"
      },
      "message": "mm: fix missing page table unmap for stack guard page failure case\n\n.. which didn\u0027t show up in my tests because it\u0027s a no-op on x86-64 and\nmost other architectures.  But we enter the function with the last-level\npage table mapped, and should unmap it at exit.\n\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "320b2b8de12698082609ebbc1a17165727f4c893",
      "tree": "bb62fe1ba3bb8bf68ff1fd44e613ece9c9581c36",
      "parents": [
        "2069601b3f0ea38170d4b509b89f3ca0a373bdc1"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Aug 12 17:54:33 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Aug 12 17:54:33 2010 -0700"
      },
      "message": "mm: keep a guard page below a grow-down stack segment\n\nThis is a rather minimally invasive patch to solve the problem of the\nuser stack growing into a memory mapped area below it.  Whenever we fill\nthe first page of the stack segment, expand the segment down by one\npage.\n\nNow, admittedly some odd application might _want_ the stack to grow down\ninto the preceding memory mapping, and so we may at some point need to\nmake this a process tunable (some people might also want to have more\nthan a single page of guarding), but let\u0027s try the minimal approach\nfirst.\n\nTested with trivial application that maps a single page just below the\nstack, and then starts recursing.  Without this, we will get a SIGSEGV\n_after_ the stack has smashed the mapping.  With this patch, we\u0027ll get a\nnice SIGBUS just as the stack touches the page just above the mapping.\n\nRequested-by: Keith Packard \u003ckeithp@keithp.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "57250a5bf0f6ff68dc339572adbd881a11f366fa",
      "tree": "ef11c141a9f89403bcd4b1fc705d672c0ff41818",
      "parents": [
        "58c37f6e0dfaaab85a3c11fcbf24451dfe70c721"
      ],
      "author": {
        "name": "Jeremy Fitzhardinge",
        "email": "jeremy@goop.org",
        "time": "Mon Aug 09 17:19:52 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Aug 09 20:45:03 2010 -0700"
      },
      "message": "mmu-notifiers: remove mmu notifier calls in apply_to_page_range()\n\nIt is not appropriate for apply_to_page_range() to directly call any mmu\nnotifiers, because it is a general purpose function whose effect depends\non what context it is called in and what the callback function does.\n\nIn particular, if it is being used as part of an mmu notifier\nimplementation, the recursive calls can be particularly problematic.\n\nIt is up to apply_to_page_range\u0027s caller to do any notifier calls if\nnecessary.  It does not affect any in-tree users because they all operate\non init_mm, and mmu notifiers only pertain to usermode mappings.\n\n[stefano.stabellini@eu.citrix.com: remove unused local `start\u0027]\nSigned-off-by: Jeremy Fitzhardinge \u003cjeremy.fitzhardinge@citrix.com\u003e\nSigned-off-by: Stefano Stabellini \u003cstefano.stabellini@eu.citrix.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: Stefano Stabellini \u003cstefano.stabellini@eu.citrix.com\u003e\nCc: Avi Kivity \u003cavi@qumranet.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "9a5b489b870def9a93f5e89dac03ebe136f901db",
      "tree": "df7f0acfdb81ce0d77b78ff4d131c40472731994",
      "parents": [
        "ad8c2ee801ad7a52d919b478d9b2c7b39a72d295"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Mon Aug 09 17:19:49 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Aug 09 20:45:02 2010 -0700"
      },
      "message": "mm: set VM_FAULT_WRITE in do_swap_page()\n\nSet the flag if do_swap_page is decowing the page the same way do_wp_page\nwould too.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ad8c2ee801ad7a52d919b478d9b2c7b39a72d295",
      "tree": "bc56cc023da3467447b0aecd30c0516881d53992",
      "parents": [
        "51b1bd2ace1595b72956224deda349efa880b693"
      ],
      "author": {
        "name": "Rik van Riel",
        "email": "riel@redhat.com",
        "time": "Mon Aug 09 17:19:48 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Aug 09 20:45:02 2010 -0700"
      },
      "message": "rmap: add exclusive page to private anon_vma on swapin\n\nOn swapin it is fairly common for a page to be owned exclusively by one\nprocess.  In that case we want to add the page to the anon_vma of that\nprocess\u0027s VMA, instead of to the root anon_vma.\n\nThis will reduce the amount of rmap searching that the swapout code needs\nto do.\n\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4e60c86bd9e5a7110ed28874d0b6592186550ae8",
      "tree": "9fb60e9f49b44b293a0c0c7d9f40e1a354a22b5a",
      "parents": [
        "627295e492638936e76f3d8fcb1e0a3367b88341"
      ],
      "author": {
        "name": "Andi Kleen",
        "email": "andi@firstfloor.org",
        "time": "Mon Aug 09 17:19:03 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Aug 09 20:44:58 2010 -0700"
      },
      "message": "gcc-4.6: mm: fix unused but set warnings\n\nNo real bugs, just some dead code and some fixups.\n\nSigned-off-by: Andi Kleen \u003cak@linux.intel.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "de51257aa301652876ab6e8f13ea4eadbe4a3846",
      "tree": "388ee39bed1d7e362438d047b57399a28e2617f8",
      "parents": [
        "51c20fcced5badee0e2021c6c89f44aa3cbd72aa"
      ],
      "author": {
        "name": "Hugh Dickins",
        "email": "hughd@google.com",
        "time": "Fri Jul 30 10:58:26 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Jul 30 18:56:09 2010 -0700"
      },
      "message": "mm: fix ia64 crash when gcore reads gate area\n\nDebian\u0027s ia64 autobuilders have been seeing kernel freeze or reboot\nwhen running the gdb testsuite (Debian bug 588574): dannf bisected to\n2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 \"mm: ZERO_PAGE without\nPTE_SPECIAL\"; and reproduced it with gdb\u0027s gcore on a simple target.\n\nI\u0027d missed updating the gate_vma handling in __get_user_pages(): that\nhappens to use vm_normal_page() (nowadays failing on the zero page),\nyet reported success even when it failed to get a page - boom when\naccess_process_vm() tried to copy that to its intermediate buffer.\n\nFix this, resisting cleanups: in particular, leave it for now reporting\nsuccess when not asked to get any pages - very probably safe to change,\nbut let\u0027s not risk it without testing exposure.\n\nWhy did ia64 crash with 16kB pages, but succeed with 64kB pages?\nBecause setup_gate() pads each 64kB of its gate area with zero pages.\n\nReported-by: Andreas Barth \u003caba@not.so.argh.org\u003e\nBisected-by: dann frazier \u003cdannf@debian.org\u003e\nSigned-off-by: Hugh Dickins \u003chughd@google.com\u003e\nTested-by: dann frazier \u003cdannf@dannf.org\u003e\nCc: stable@kernel.org\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "142762bd8d8c46345e79f0f68d3374564306972f",
      "tree": "c33360b872883d24b068ba7b8f01466fccb9dfc9",
      "parents": [
        "58a9d3d8db06ca2ec31f64ec49ab0aeb89971b85"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "hannes@cmpxchg.org",
        "time": "Mon May 24 14:32:39 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue May 25 08:07:00 2010 -0700"
      },
      "message": "mm: document follow_page()\n\nSigned-off-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Dan Carpenter \u003cerror27@gmail.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Izik Eidus \u003cieidus@redhat.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a3a2e76c77fa22b114e421ac11dec0c56c3503fb",
      "tree": "cc67bbd8d5d364e55ea7a00d0b5ad68d5eac08ac",
      "parents": [
        "b01d0942c2b7a3026d2b7d38b5773d3d00420e06"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Apr 06 14:34:42 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Apr 07 08:38:02 2010 -0700"
      },
      "message": "mm: avoid null-pointer deref in sync_mm_rss()\n\n- We weren\u0027t zeroing p-\u003erss_stat[] at fork()\n\n- Consequently sync_mm_rss() was dereferencing tsk-\u003emm for kernel\n  threads and was oopsing.\n\n- Make __sync_task_rss_stat() static, too.\n\nAddresses https://bugzilla.kernel.org/show_bug.cgi?id\u003d15648\n\n[akpm@linux-foundation.org: remove the BUG_ON(!mm-\u003erss)]\nReported-by: Troels Liebe Bentsen \u003ctlb@rapanden.dk\u003e\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\n\"Michael S. Tsirkin\" \u003cmst@redhat.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5a0e3ad6af8660be21ca98a971cd00f331318c05",
      "tree": "5bfb7be11a03176a87296a43ac6647975c00a1d1",
      "parents": [
        "ed391f4ebf8f701d3566423ce8f17e614cde9806"
      ],
      "author": {
        "name": "Tejun Heo",
        "email": "tj@kernel.org",
        "time": "Wed Mar 24 17:04:11 2010 +0900"
      },
      "committer": {
        "name": "Tejun Heo",
        "email": "tj@kernel.org",
        "time": "Tue Mar 30 22:02:32 2010 +0900"
      },
      "message": "include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h\n\npercpu.h is included by sched.h and module.h and thus ends up being\nincluded when building most .c files.  percpu.h includes slab.h which\nin turn includes gfp.h making everything defined by the two files\nuniversally available and complicating inclusion dependencies.\n\npercpu.h -\u003e slab.h dependency is about to be removed.  Prepare for\nthis change by updating users of gfp and slab facilities include those\nheaders directly instead of assuming availability.  As this conversion\nneeds to touch large number of source files, the following script is\nused as the basis of conversion.\n\n  http://userweb.kernel.org/~tj/misc/slabh-sweep.py\n\nThe script does the followings.\n\n* Scan files for gfp and slab usages and update includes such that\n  only the necessary includes are there.  ie. if only gfp is used,\n  gfp.h, if slab is used, slab.h.\n\n* When the script inserts a new include, it looks at the include\n  blocks and try to put the new include such that its order conforms\n  to its surrounding.  It\u0027s put in the include block which contains\n  core kernel includes, in the same order that the rest are ordered -\n  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there\n  doesn\u0027t seem to be any matching order.\n\n* If the script can\u0027t find a place to put a new include (mostly\n  because the file doesn\u0027t have fitting include block), it prints out\n  an error message indicating which .h file needs to be added to the\n  file.\n\nThe conversion was done in the following steps.\n\n1. The initial automatic conversion of all .c files updated slightly\n   over 4000 files, deleting around 700 includes and adding ~480 gfp.h\n   and ~3000 slab.h inclusions.  The script emitted errors for ~400\n   files.\n\n2. Each error was manually checked.  Some didn\u0027t need the inclusion,\n   some needed manual addition while adding it to implementation .h or\n   embedding .c file was more appropriate for others.  This step added\n   inclusions to around 150 files.\n\n3. The script was run again and the output was compared to the edits\n   from #2 to make sure no file was left behind.\n\n4. Several build tests were done and a couple of problems were fixed.\n   e.g. lib/decompress_*.c used malloc/free() wrappers around slab\n   APIs requiring slab.h to be added manually.\n\n5. The script was run on all .h files but without automatically\n   editing them as sprinkling gfp.h and slab.h inclusions around .h\n   files could easily lead to inclusion dependency hell.  Most gfp.h\n   inclusion directives were ignored as stuff from gfp.h was usually\n   wildly available and often used in preprocessor macros.  Each\n   slab.h inclusion directive was examined and added manually as\n   necessary.\n\n6. percpu.h was updated not to include slab.h.\n\n7. Build test were done on the following configurations and failures\n   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my\n   distributed build env didn\u0027t work with gcov compiles) and a few\n   more options had to be turned off depending on archs to make things\n   build (like ipr on powerpc/64 which failed due to missing writeq).\n\n   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.\n   * powerpc and powerpc64 SMP allmodconfig\n   * sparc and sparc64 SMP allmodconfig\n   * ia64 SMP allmodconfig\n   * s390 SMP allmodconfig\n   * alpha SMP allmodconfig\n   * um on x86_64 SMP allmodconfig\n\n8. percpu.h modifications were reverted so that it could be applied as\n   a separate patch and serve as bisection point.\n\nGiven the fact that I had only a couple of failures from tests on step\n6, I\u0027m fairly confident about the coverage of this conversion patch.\nIf there is a breakage, it\u0027s likely to be something in one of the arch\nheaders which should be easily discoverable easily on most builds of\nthe specific arch.\n\nSigned-off-by: Tejun Heo \u003ctj@kernel.org\u003e\nGuess-its-ok-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Ingo Molnar \u003cmingo@redhat.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\n"
    }
  ],
  "next": "298359c5bf06c04258d7cf552426e198c47e83c1"
}
