)]}'
{
  "log": [
    {
      "commit": "d6bf73e4340f52159c1d9f13836b62e20fcd12d3",
      "tree": "18e3b9e4f126958d7d55758d9a70375a847b5760",
      "parents": [
        "6a4ad39b3de60ad0e75a78098be0f0eb1722b753"
      ],
      "author": {
        "name": "MinChan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Tue Aug 12 15:08:52 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Aug 12 16:07:29 2008 -0700"
      },
      "message": "do_migrate_pages(): remove unused variable\n\nSigned-off-by: MinChan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a5516438959d90b071ff0a484ce4f3f523dc3152",
      "tree": "e356ba9364c76b93c176b4d4a262b7aca3ee8f91",
      "parents": [
        "b7ba30c679ed1eb7ed3ed8f281f6493282042bd4"
      ],
      "author": {
        "name": "Andi Kleen",
        "email": "ak@suse.de",
        "time": "Wed Jul 23 21:27:41 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jul 24 10:47:17 2008 -0700"
      },
      "message": "hugetlb: modular state for hugetlb page size\n\nThe goal of this patchset is to support multiple hugetlb page sizes.  This\nis achieved by introducing a new struct hstate structure, which\nencapsulates the important hugetlb state and constants (eg.  huge page\nsize, number of huge pages currently allocated, etc).\n\nThe hstate structure is then passed around the code which requires these\nfields, they will do the right thing regardless of the exact hstate they\nare operating on.\n\nThis patch adds the hstate structure, with a single global instance of it\n(default_hstate), and does the basic work of converting hugetlb to use the\nhstate.\n\nFuture patches will add more hstate structures to allow for different\nhugetlbfs mounts to have different page sizes.\n\n[akpm@linux-foundation.org: coding-style fixes]\nAcked-by: Adam Litke \u003cagl@us.ibm.com\u003e\nAcked-by: Nishanth Aravamudan \u003cnacc@us.ibm.com\u003e\nSigned-off-by: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Nick Piggin \u003cnpiggin@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d79df630f622806c4d0e116fbaf6ebf6baf53461",
      "tree": "d5a1c06209210c84ec60099687e4a87723ae1599",
      "parents": [
        "b8a0b6ccf2ba2519ace65d782b41ee91bf3c3778"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Fri Jul 04 12:24:13 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Jul 04 13:03:05 2008 -0700"
      },
      "message": "mempolicy: mask off internal flags for userspace API\n\nFlags considered internal to the mempolicy kernel code are stored as part\nof the \"flags\" member of struct mempolicy.\n\nBefore exposing a policy type to userspace via get_mempolicy(), these\ninternal flags must be masked.  Flags exposed to userspace, however,\nshould still be returned to the user.\n\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "71fe804b6d56d6a7aed680e096901434cef6a2c3",
      "tree": "3dd437e09fe6ee57644c72c79e08c562d4bb6389",
      "parents": [
        "3f226aa1cbc006f9d90f22084f519ad2a1286cd8"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:26 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:25 2008 -0700"
      },
      "message": "mempolicy: use struct mempolicy pointer in shmem_sb_info\n\nThis patch replaces the mempolicy mode, mode_flags, and nodemask in the\nshmem_sb_info struct with a struct mempolicy pointer, initialized to NULL.\nThis removes dependency on the details of mempolicy from shmem.c and hugetlbfs\ninode.c and simplifies the interfaces.\n\nmpol_parse_str() in mempolicy.c is changed to return, via a pointer to a\npointer arg, a struct mempolicy pointer on success.  For MPOL_DEFAULT, the\nreturned pointer is NULL.  Further, mpol_parse_str() now takes a \u0027no_context\u0027\nargument that causes the input nodemask to be stored in the w.user_nodemask of\nthe created mempolicy for use when the mempolicy is installed in a tmpfs inode\nshared policy tree.  At that time, any cpuset contextualization is applied to\nthe original input nodemask.  This preserves the previous behavior where the\ninput nodemask was stored in the superblock.  We can think of the returned\nmempolicy as \"context free\".\n\nBecause mpol_parse_str() is now calling mpol_new(), we can remove from\nmpol_to_str() the semantic checks that mpol_new() already performs.\n\nAdd \u0027no_context\u0027 parameter to mpol_to_str() to specify that it should format\nthe nodemask in w.user_nodemask for \u0027bind\u0027 and \u0027interleave\u0027 policies.\n\nChange mpol_shared_policy_init() to take a pointer to a \"context free\" struct\nmempolicy and to create a new, \"contextualized\" mempolicy using the mode,\nmode_flags and user_nodemask from the input mempolicy.\n\n  Note: we know that the mempolicy passed to mpol_to_str() or\n  mpol_shared_policy_init() from a tmpfs superblock is \"context free\".  This\n  is currently the only instance thereof.  However, if we found more uses for\n  this concept, and introduced any ambiguity as to whether a mempolicy was\n  context free or not, we could add another internal mode flag to identify\n  context free mempolicies.  Then, we could remove the \u0027no_context\u0027 argument\n  from mpol_to_str().\n\nAdded shmem_get_sbmpol() to return a reference counted superblock mempolicy,\nif one exists, to pass to mpol_shared_policy_init().  We must add the\nreference under the sb stat_lock to prevent races with replacement of the mpol\nby remount.  This reference is removed in mpol_shared_policy_init().\n\n[akpm@linux-foundation.org: build fix]\n[akpm@linux-foundation.org: another build fix]\n[akpm@linux-foundation.org: yet another build fix]\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3f226aa1cbc006f9d90f22084f519ad2a1286cd8",
      "tree": "e54a87abb19f9dea184f4c17c4c8532d3d715de8",
      "parents": [
        "095f1fc4ebf36c64fddf9b6db29b1ab5517378e6"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:24 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:25 2008 -0700"
      },
      "message": "mempolicy: support mpol\u003dlocal tmpfs mount option\n\nFor tmpfs/shmem shared policies, MPOL_DEFAULT is not necessarily equivalent to\n\"local allocation\".  Because shared policies are at the same \"scope\" level\n[see Documentation/vm/numa_memory_policy.txt], as vma policies MPOL_DEFAULT\nmeans \"fall back to current task policy\".\n\nThis patch extends the memory policy string parsing function to display\n\"local\" for MPOL_PREFERRED + MPOL_F_LOCAL.  This allows one to specify local\nallocation as the default policy for shared memory areas via the tmpfs mpol\nmount option, regardless of the current task\u0027s policy.\n\nAlso, \"local\" is now displayed for this policy.  This patch allows us to\naccept the same input format as the display.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "095f1fc4ebf36c64fddf9b6db29b1ab5517378e6",
      "tree": "39aae9d5b05d8501d1794e92c6115331c0a40848",
      "parents": [
        "2291990ab36b4b2d8a81b1f92e7a046e51632a60"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:23 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: rework shmem mpol parsing and display\n\nmm/shmem.c currently contains functions to parse and display memory policy\nstrings for the tmpfs \u0027mpol\u0027 mount option.  Move this to mm/mempolicy.c with\nthe rest of the mempolicy support.  With subsequent patches, we\u0027ll be able to\nremove knowledge of the details [mode, flags, policy, ...] completely from\nshmem.c\n\n1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in\n   mm/mempolicy.c.  Rework to use the policy_types[] array [used by\n   mpol_to_str()] to look up mode by name.\n\n2) use mpol_to_str() to format policy for shmem_show_mpol().  mpol_to_str()\n   expects a pointer to a struct mempolicy, so temporarily construct one.\n   This will be replaced with a reference to a struct mempolicy in the tmpfs\n   superblock in a subsequent patch.\n\n   NOTE 1: I changed mpol_to_str() to use a colon \u0027:\u0027 rather than an equal\n   sign \u0027\u003d\u0027 as the nodemask delimiter to match mpol_parse_str() and the\n   tmpfs/shmem mpol mount option formatting that now uses mpol_to_str().  This\n   is a user visible change to numa_maps, but then the addition of the mode\n   flags already changed the display.  It makes sense to me to have the mounts\n   and numa_maps display the policy in the same format.  However, if anyone\n   objects strongly, I can pass the desired nodemask delimeter as an arg to\n   mpol_to_str().\n\n   Note 2: Like show_numa_map(), I don\u0027t check the return code from\n   mpol_to_str().  I do use a longer buffer than the one provided by\n   show_numa_map(), which seems to have sufficed so far.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "2291990ab36b4b2d8a81b1f92e7a046e51632a60",
      "tree": "b5b6b8ca9c92bc3c84cd7e7aefb5010d3793db4d",
      "parents": [
        "fc36b8d3d819047eb4d23ca079fb4d3af20ff076"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:22 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: clean-up mpol-to-str() mempolicy formatting\n\nmpol-to-str() formats memory policies into printable strings.  Currently this\nis only used to display \"numa_maps\".  A subsequent patch will use\nmpol_to_str() for formatting tmpfs [shmem] mpol mount options, allowing us to\nremove essentially duplicate code in mm/shmem.c.  This patch cleans up\nmpol_to_str() generally and in preparation for that patch.\n\n1) show_numa_maps() is not checking the return code from mpol_to_str().\n   There\u0027s not a lot we can do in this context if mpol_to_str() did return the\n   error [insufficient space in buffer].  Proposed \"solution\": just check,\n   under DEBUG_VM, that callers are providing sufficient buffer space for the\n   policy, flags, and a few nodes.  This way, we\u0027ll get some display.\n   show_numa_maps() is providing a 50-byte buffer, so it won\u0027t trip this\n   check.  50-bytes should be sufficient unless one has a large number of\n   nodes in a very sparse nodemask.\n\n2) The display of the new mode flags [\"static\" \u0026 \"relative\"] was set up to\n   display multiple flags, separated by a \"bar\" \u0027|\u0027.  However, this support is\n   incomplete--e.g., need_bar was never incremented; and currently, these two\n   flags are mutually exclusive.  So remove the \"bar\" support, for now, and\n   only display one flag.\n\n3) Use snprint() to format flags, so as not to overflow the buffer.  Not\n   that it\u0027s ever happed, AFAIK.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "fc36b8d3d819047eb4d23ca079fb4d3af20ff076",
      "tree": "65ee215a6bdca1e8d4ac4b57525445d7d1829c1d",
      "parents": [
        "53f2556b6792ed99fde965f5e061749edd455623"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:21 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy\n\nNow that we\u0027re using \"preferred local\" policy for system default, we need to\nmake this as fast as possible.  Because of the variable size of the mempolicy\nstructure [based on size of nodemasks], the preferred_node may be in a\ndifferent cacheline from the mode.  This can result in accessing an extra\ncacheline in the normal case of system default policy.  Suspect this is the\ncause of an observed 2-3% slowdown in page fault testing relative to kernel\nwithout this patch series.\n\nTo alleviate this, use an internal mode flag, MPOL_F_LOCAL in the mempolicy\nflags member which is guaranteed [?] to be in the same cacheline as the mode\nitself.\n\nVerified that reworked mempolicy now performs slightly better on 25-rc8-mm1\nfor both anon and shmem segments with system default and vma [preferred local]\npolicy.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "53f2556b6792ed99fde965f5e061749edd455623",
      "tree": "82a679e33bc8c305297cbe4655be0dac24728907",
      "parents": [
        "bea904d54d6faa92400f10c8ea3d3828b8e1eb93"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:20 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: mPOL_PREFERRED cleanups for \"local allocation\"\n\nHere are a couple of \"cleanups\" for MPOL_PREFERRED behavior when\nv.preferred_node \u003c 0 -- i.e., \"local allocation\":\n\n1)  [do_]get_mempolicy() calls the now renamed get_policy_nodemask()\n    to fetch the nodemask associated with a policy.  Currently,\n    get_policy_nodemask() returns the set of nodes with memory, when\n    the policy \u0027mode\u0027 is \u0027PREFERRED, and the preferred_node is \u003c 0.\n    Change to return an empty nodemask, as this is what was specified\n    to achieve \"local allocation\".\n\n2)  When a task is moved into a [new] cpuset, mpol_rebind_policy() is\n    called to adjust any task and vma policy nodes to be valid in the\n    new cpuset.  However, when the policy is MPOL_PREFERRED, and the\n    preferred_node is \u003c0, no rebind is necessary.  The \"local allocation\"\n    indication is valid in any cpuset.  Existing code will \"do the right\n    thing\" because node_remap() will just return the argument node when\n    it is outside of the valid range of node ids.  However, I think it is\n    clearer and cleaner to skip the remap explicitly in this case.\n\n3)  mpol_to_str() produces a printable, \"human readable\" string from a\n    struct mempolicy.  For MPOL_PREFERRED with preferred_node \u003c0,  show\n    \"local\", as this indicates local allocation, as the task migrates\n    among nodes.  Note that this matches the usage of \"local allocation\"\n    in libnuma() and numactl.  Without this change, I believe that node_set()\n    [via set_bit()] will set bit 31, resulting in a misleading display.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "bea904d54d6faa92400f10c8ea3d3828b8e1eb93",
      "tree": "24966dd4dabadb4bb32aa1e00fae2c2168661229",
      "parents": [
        "52cd3b074050dd664380b5e8cfc85d4a6ed8ad48"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:18 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: use MPOL_PREFERRED for system-wide default policy\n\nCurrently, when one specifies MPOL_DEFAULT via a NUMA memory policy API\n[set_mempolicy(), mbind() and internal versions], the kernel simply installs a\nNULL struct mempolicy pointer in the appropriate context: task policy, vma\npolicy, or shared policy.  This causes any use of that policy to \"fall back\"\nto the next most specific policy scope.\n\nThe only use of MPOL_DEFAULT to mean \"local allocation\" is in the system\ndefault policy.  This requires extra checks/cases for MPOL_DEFAULT in many\nmempolicy.c functions.\n\nThere is another, \"preferred\" way to specify local allocation via the APIs.\nThat is using the MPOL_PREFERRED policy mode with an empty nodemask.\nInternally, the empty nodemask gets converted to a preferred_node id of \u0027-1\u0027.\nAll internal usage of MPOL_PREFERRED will convert the \u0027-1\u0027 to the id of the\nnode local to the cpu where the allocation occurs.\n\nSystem default policy, except during boot, is hard-coded to \"local\nallocation\".  By using the MPOL_PREFERRED mode with a negative value of\npreferred node for system default policy, MPOL_DEFAULT will never occur in the\n\u0027policy\u0027 member of a struct mempolicy.  Thus, we can remove all checks for\nMPOL_DEFAULT when converting policy to a node id/zonelist in the allocation\npaths.\n\nIn slab_node() return local node id when policy pointer is NULL.  No need to\nset a pol value to take the switch default.  Replace switch default with\nBUG()--i.e., shouldn\u0027t happen.\n\nWith this patch MPOL_DEFAULT is only used in the APIs, including internal\ncalls to do_set_mempolicy() and in the display of policy in\n/proc/\u003cpid\u003e/numa_maps.  It always means \"fall back\" to the the next most\nspecific policy scope.  This simplifies the description of memory policies\nquite a bit, with no visible change in behavior.\n\nget_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when\nthe requested policy [task or vma/shared] is NULL.  These are the values one\nwould supply via set_mempolicy() or mbind() to achieve that condition--default\nbehavior.\n\nThis patch updates Documentation to reflect this change.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "52cd3b074050dd664380b5e8cfc85d4a6ed8ad48",
      "tree": "fcfcf55c0e81376ea34919fab26e29bedd7f3b88",
      "parents": [
        "a6020ed759404372e8be2b276e85e51735472cc9"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:16 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: rework mempolicy Reference Counting [yet again]\n\nAfter further discussion with Christoph Lameter, it has become clear that my\nearlier attempts to clean up the mempolicy reference counting were a bit of\noverkill in some areas, resulting in superflous ref/unref in what are usually\nfast paths.  In other areas, further inspection reveals that I botched the\nunref for interleave policies.\n\nA separate patch, suitable for upstream/stable trees, fixes up the known\nerrors in the previous attempt to fix reference counting.\n\nThis patch reworks the memory policy referencing counting and, one hopes,\nsimplifies the code.  Maybe I\u0027ll get it right this time.\n\nSee the update to the numa_memory_policy.txt document for a discussion of\nmemory policy reference counting that motivates this patch.\n\nSummary:\n\nLookup of mempolicy, based on (vma, address) need only add a reference for\nshared policy, and we need only unref the policy when finished for shared\npolicies.  So, this patch backs out all of the unneeded extra reference\ncounting added by my previous attempt.  It then unrefs only shared policies\nwhen we\u0027re finished with them, using the mpol_cond_put() [conditional put]\nhelper function introduced by this patch.\n\nNote that shmem_swapin() calls read_swap_cache_async() with a dummy vma\ncontaining just the policy.  read_swap_cache_async() can call alloc_page_vma()\nmultiple times, so we can\u0027t let alloc_page_vma() unref the shared policy in\nthis case.  To avoid this, we make a copy of any non-null shared policy and\nremove the MPOL_F_SHARED flag from the copy.  This copy occurs before reading\na page [or multiple pages] from swap, so the overhead should not be an issue\nhere.\n\nI introduced a new static inline function \"mpol_cond_copy()\" to copy the\nshared policy to an on-stack policy and remove the flags that would require a\nconditional free.  The current implementation of mpol_cond_copy() assumes that\nthe struct mempolicy contains no pointers to dynamically allocated structures\nthat must be duplicated or reference counted during copy.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "aab0b1029f0843756b68e0ed3ca983685bf43ed6",
      "tree": "95a0546c908a1c7a3bf1353f433ff1464faf68db",
      "parents": [
        "45c4745af381851b0406d8e4db99e62e265691c2"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:13 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: mark shared policies for unref\n\nAs part of yet another rework of mempolicy reference counting, we want to be\nable to identify shared policies efficiently, because they have an extra ref\ntaken on lookup that needs to be removed when we\u0027re finished using the policy.\n\n  Note:  the extra ref is required because the policies are\n  shared between tasks/processes and can be changed/freed\n  by one task while another task is using them--e.g., for\n  page allocation.\n\nBuilding on David Rientjes mempolicy \"mode flags\" enhancement, this patch\nindicates a \"shared\" policy by setting a new MPOL_F_SHARED flag in the flags\nmember of the struct mempolicy added by David.  MPOL_F_SHARED, and any future\n\"internal mode flags\" are reserved from bit zero up, as they will never be\npassed in the upper bits of the mode argument of a mempolicy API.\n\nI set the MPOL_F_SHARED flag when the policy is installed in the shared policy\nrb-tree.  Don\u0027t need/want to clear the flag when removing from the tree as the\nmempolicy is freed [unref\u0027d] internally to the sp_delete() function.  However,\na task could hold another reference on this mempolicy from a prior lookup.  We\nneed the MPOL_F_SHARED flag to stay put so that any tasks holding a ref will\nunref, eventually freeing, the mempolicy.\n\nA later patch in this series will introduce a function to conditionally unref\n[mpol_free] a policy.  The MPOL_F_SHARED flag is one reason [currently the\nonly reason] to unref/free a policy via the conditional free.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "45c4745af381851b0406d8e4db99e62e265691c2",
      "tree": "d93f6f7b3d7eb3773aaa80444c56baff99e670d6",
      "parents": [
        "ae4d8c16aa22775f5731677abb8a82f03cec877e"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:12 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: rename struct mempolicy \u0027policy\u0027 member to \u0027mode\u0027\n\nThe terms \u0027policy\u0027 and \u0027mode\u0027 are both used in various places to describe the\nsemantics of the value stored in the \u0027policy\u0027 member of struct mempolicy.\nFurthermore, the term \u0027policy\u0027 is used to refer to that member, to the entire\nstruct mempolicy and to the more abstract concept of the tuple consisting of a\n\"mode\" and an optional node or set of nodes.  Recently, we have added \"mode\nflags\" that are passed in the upper bits of the \u0027mode\u0027 [or sometimes,\n\u0027policy\u0027] member of the numa APIs.\n\nI\u0027d like to resolve this confusion, which perhaps only exists in my mind, by\nrenaming the \u0027policy\u0027 member to \u0027mode\u0027 throughout, and fixing up the\nDocumentation.  Man pages will be updated separately.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ae4d8c16aa22775f5731677abb8a82f03cec877e",
      "tree": "2d34c6fd334b51cc5f3265cd4ec26db758639b5c",
      "parents": [
        "f4e53d910b7dde2685b177f1e7c3e3e0b4a42f7b"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:11 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:24 2008 -0700"
      },
      "message": "mempolicy: fixup Fallback for Default Shmem Policy\n\nget_vma_policy() is not handling fallback to task policy correctly when the\nget_policy() vm_op returns NULL.  The NULL overwrites the \u0027pol\u0027 variable that\nwas holding the fallback task mempolicy.  So, it was falling back directly to\nsystem default policy.\n\nFix get_vma_policy() to use only non-NULL policy returned from the vma\nget_policy op.\n\nshm_get_policy() was falling back to current task\u0027s mempolicy if the \"backing\nfile system\" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and\nthe vma policy is null.  This is incorrect for show_numa_maps() which is\nlikely querying the numa_maps of some task other than current.  Remove this\nfallback.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f4e53d910b7dde2685b177f1e7c3e3e0b4a42f7b",
      "tree": "db4db3abbc0b63de88dabd13e9359ca86973935c",
      "parents": [
        "846a16bf0fc80dc95a414ffce465e3cbf9680247"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:10 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:23 2008 -0700"
      },
      "message": "mempolicy: write lock mmap_sem while changing task mempolicy\n\nA read of /proc/\u003cpid\u003e/numa_maps holds the target task\u0027s mmap_sem for read\nwhile examining each vma\u0027s mempolicy.  A vma\u0027s mempolicy can fall back to the\ntask\u0027s policy.  However, the task could be changing it\u0027s task policy and free\nthe one that the show_numa_maps() is examining.\n\nTo prevent this, grab the mmap_sem for write when updating task mempolicy.\nPointed out to me by Christoph Lameter and extracted and reworked from\nChristoph\u0027s alternative mempol reference counting patch.\n\nThis is analogous to the way that do_mbind() and do_get_mempolicy() prevent\nraces between task\u0027s sharing an mm_struct [a.k.a.  threads] setting and\nquerying a mempolicy for a particular address.\n\nNote: this is necessary, but not sufficient, to allow us to stop taking an\nextra reference on \"other task\u0027s mempolicy\" in get_vma_policy.  Subsequent\npatches will complete this update, allowing us to simplify the tests for\nwhether we need to unref a mempolicy at various points in the code.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "846a16bf0fc80dc95a414ffce465e3cbf9680247",
      "tree": "45e03061c5e3d8242bf470509771926f37177415",
      "parents": [
        "f0be3d32b05d3fea2fcdbbb81a39dac2a7163169"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:09 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:23 2008 -0700"
      },
      "message": "mempolicy: rename mpol_copy to mpol_dup\n\nThis patch renames mpol_copy() to mpol_dup() because, well, that\u0027s what it\ndoes.  Like, e.g., strdup() for strings, mpol_dup() takes a pointer to an\nexisting mempolicy, allocates a new one and copies the contents.\n\nIn a later patch, I want to use the name mpol_copy() to copy the contents from\none mempolicy to another like, e.g., strcpy() does for strings.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f0be3d32b05d3fea2fcdbbb81a39dac2a7163169",
      "tree": "5794ce6a8befbce82cd3e44ff15fbf3bb5f2f3bf",
      "parents": [
        "3b1163006332302117b1b2acf226d4014ff46525"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Mon Apr 28 02:13:08 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:23 2008 -0700"
      },
      "message": "mempolicy: rename mpol_free to mpol_put\n\nThis is a change that was requested some time ago by Mel Gorman.  Makes sense\nto me, so here it is.\n\nNote: I retain the name \"mpol_free_shared_policy()\" because it actually does\nfree the shared_policy, which is NOT a reference counted object.  However, ...\n\nThe mempolicy object[s] referenced by the shared_policy are reference counted,\nso mpol_put() is used to release the reference held by the shared_policy.  The\nmempolicy might not be freed at this time, because some task attached to the\nshared object associated with the shared policy may be in the process of\nallocating a page based on the mempolicy.  In that case, the task performing\nthe allocation will hold a reference on the mempolicy, obtained via\nmpol_shared_policy_lookup().  The mempolicy will be freed when all tasks\nholding such a reference have called mpol_put() for the mempolicy.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3e1f064562fcff7bf3856bc1d00dfa84d4f121cc",
      "tree": "9ebc17449238ab5284b72f634405044376dc816b",
      "parents": [
        "3842b46de626d1a3c44ad280d67ab0a4dc047d13"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Mon Apr 28 02:12:34 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:20 2008 -0700"
      },
      "message": "mempolicy: disallow static or relative flags for local preferred mode\n\nMPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don\u0027t mean anything for\nMPOL_PREFERRED policies that were created with an empty nodemask (for purely\nlocal allocations).  They\u0027ll never be invalidated because the allowed mems of\na task changes or need to be rebound relative to a cpuset\u0027s placement.\n\nAlso fixes a bug identified by Lee Schermerhorn that disallowed empty\nnodemasks to be passed to MPOL_PREFERRED to specify local allocations.  [A\ndifferent, somewhat incomplete, patch already existed in 25-rc5-mm1.]\n\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: Randy Dunlap \u003crandy.dunlap@oracle.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "37012946da940521fb997a758a219d2f1ab56e51",
      "tree": "16553f91325720a53d0314070793700b5926e052",
      "parents": [
        "1d0d2680a01c4f9e292ec6d4714884da939053a1"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Mon Apr 28 02:12:33 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:20 2008 -0700"
      },
      "message": "mempolicy: create mempolicy_operations structure\n\nCreate a mempolicy_operations structure that currently points to two\nfunctions[*] for the various modes:\n\n\tint (*create)(struct mempolicy *, const nodemask_t *);\n\tvoid (*rebind)(struct mempolicy *, const nodemask_t *);\n\nThis splits the implementation for the various modes out of two large\nfunctions, mpol_new() and mpol_rebind_policy().  Eventually it may be\nbeneficial to add additional functions to accomodate the existing switch()\nstatements in mm/mempolicy.c.\n\n [*] The -\u003ecreate() function for MPOL_DEFAULT is currently NULL since no\n     struct mempolicy is dynamically allocated.\n\n[Lee.Schermerhorn@hp.com: fix regression in the package mempolicy regression tests]\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Eric Whitney \u003ceric.whitney@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1d0d2680a01c4f9e292ec6d4714884da939053a1",
      "tree": "1377ed40ec15ffecc584b308a671be47b5145db3",
      "parents": [
        "65d66fc02ed9433b957588071b60425b12628e25"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Mon Apr 28 02:12:32 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:20 2008 -0700"
      },
      "message": "mempolicy: move rebind functions\n\nMove the mpol_rebind_{policy,task,mm}() functions after mpol_new() to avoid\nhaving to declare function prototypes.\n\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4c50bc0116cf3cc35e7152d6a8424b4db65f52d6",
      "tree": "99c617602cc1946fc86b48e9a35f14f9f354cf67",
      "parents": [
        "7ea931c9fc80c4d0a4306c30ec92eb0f1d922a0b"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Mon Apr 28 02:12:30 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:19 2008 -0700"
      },
      "message": "mempolicy: add MPOL_F_RELATIVE_NODES flag\n\nAdds another optional mode flag, MPOL_F_RELATIVE_NODES, that specifies\nnodemasks passed via set_mempolicy() or mbind() should be considered relative\nto the current task\u0027s mems_allowed.\n\nWhen the mempolicy is created, the passed nodemask is folded and mapped onto\nthe current task\u0027s mems_allowed.  For example, consider a task using\nset_mempolicy() to pass MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES with a\nnodemask of 1-3.  If current\u0027s mems_allowed is 4-7, the effected nodemask is\n5-7 (the second, third, and fourth node of mems_allowed).\n\nIf the same task is attached to a cpuset, the mempolicy nodemask is rebound\neach time the mems are changed.  Some possible rebinds and results are:\n\n\tmems\t\t\tresult\n\t1-3\t\t\t1-3\n\t1-7\t\t\t2-4\n\t1,5-6\t\t\t1,5-6\n\t1,5-7\t\t\t5-7\n\nLikewise, the zonelist built for MPOL_BIND acts on the set of zones assigned\nto the resultant nodemask from the relative remap.\n\nIn the MPOL_PREFERRED case, the preferred node is remapped from the currently\neffected nodemask to the relative nodemask.\n\nThis mempolicy mode flag was conceived of by Paul Jackson \u003cpj@sgi.com\u003e.\n\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f5b087b52f1710eb0bf15a2d2b030c51a6a1ca9e",
      "tree": "66336235822f59215707dfa501e1d2b66b38a015",
      "parents": [
        "028fec414d803117eb4b2ed12acb4dd5da65b32d"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Mon Apr 28 02:12:27 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:19 2008 -0700"
      },
      "message": "mempolicy: add MPOL_F_STATIC_NODES flag\n\nAdd an optional mempolicy mode flag, MPOL_F_STATIC_NODES, that suppresses the\nnode remap when the policy is rebound.\n\nAdds another member to struct mempolicy, nodemask_t user_nodemask, as part of\na union with cpuset_mems_allowed:\n\n\tstruct mempolicy {\n\t\t...\n\t\tunion {\n\t\t\tnodemask_t cpuset_mems_allowed;\n\t\t\tnodemask_t user_nodemask;\n\t\t} w;\n\t}\n\nthat stores the the nodemask that the user passed when he or she created the\nmempolicy via set_mempolicy() or mbind().  When using MPOL_F_STATIC_NODES,\nwhich is passed with any mempolicy mode, the user\u0027s passed nodemask\nintersected with the VMA or task\u0027s allowed nodes is always used when\ndetermining the preferred node, setting the MPOL_BIND zonelist, or creating\nthe interleave nodemask.  This happens whenever the policy is rebound,\nincluding when a task\u0027s cpuset assignment changes or the cpuset\u0027s mems are\nchanged.\n\nThis creates an interesting side-effect in that it allows the mempolicy\n\"intent\" to lie dormant and uneffected until it has access to the node(s) that\nit desires.  For example, if you currently ask for an interleaved policy over\na set of nodes that you do not have access to, the mempolicy is not created\nand the task continues to use the previous policy.  With this change, however,\nit is possible to create the same mempolicy; it is only effected when access\nto nodes in the nodemask is acquired.\n\nIt is also possible to mount tmpfs with the static nodemask behavior when\nspecifying a node or nodemask.  To do this, simply add \"\u003dstatic\" immediately\nfollowing the mempolicy mode at mount time:\n\n\tmount -o remount mpol\u003dinterleave\u003dstatic:1-3\n\nAlso removes mpol_check_policy() and folds its logic into mpol_new() since it\nis now obsoleted.  The unused vma_mpol_equal() is also removed.\n\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "028fec414d803117eb4b2ed12acb4dd5da65b32d",
      "tree": "427f37ea0331369c1babc55c424c4fd2ac3b39f5",
      "parents": [
        "a3b51e0142d1be156ac697eaadadd6cfbb7ba32b"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Mon Apr 28 02:12:25 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:19 2008 -0700"
      },
      "message": "mempolicy: support optional mode flags\n\nWith the evolution of mempolicies, it is necessary to support mempolicy mode\nflags that specify how the policy shall behave in certain circumstances.  The\nmost immediate need for mode flag support is to suppress remapping the\nnodemask of a policy at the time of rebind.\n\nBoth the mempolicy mode and flags are passed by the user in the \u0027int policy\u0027\nformal of either the set_mempolicy() or mbind() syscall.  A new constant,\nMPOL_MODE_FLAGS, represents the union of legal optional flags that may be\npassed as part of this int.  Mempolicies that include illegal flags as part of\ntheir policy are rejected as invalid.\n\nAn additional member to struct mempolicy is added to support the mode flags:\n\n\tstruct mempolicy {\n\t\t...\n\t\tunsigned short policy;\n\t\tunsigned short flags;\n\t}\n\nThe splitting of the \u0027int\u0027 actual passed by the user is done in\nsys_set_mempolicy() and sys_mbind() for their respective syscalls.  This is\ndone by intersecting the actual with MPOL_MODE_FLAGS, rejecting the syscall of\nthere are additional flags, and storing it in the new \u0027flags\u0027 member of struct\nmempolicy.  The intersection of the actual with ~MPOL_MODE_FLAGS is stored in\nthe \u0027policy\u0027 member of the struct and all current users of pol-\u003epolicy remain\nunchanged.\n\nThe union of the policy mode and optional mode flags is passed back to the\nuser in get_mempolicy().\n\nThis combination of mode and flags within the same actual does not break\nuserspace code that relies on get_mempolicy(\u0026policy, ...) and either\n\n\tswitch (policy) {\n\tcase MPOL_BIND:\n\t\t...\n\tcase MPOL_INTERLEAVE:\n\t\t...\n\t};\n\nstatements or\n\n\tif (policy \u003d\u003d MPOL_INTERLEAVE) {\n\t\t...\n\t}\n\nstatements.  Such applications would need to use optional mode flags when\ncalling set_mempolicy() or mbind() for these previously implemented statements\nto stop working.  If an application does start using optional mode flags, it\nwill need to mask the optional flags off the policy in switch and conditional\nstatements that only test mode.\n\nAn additional member is also added to struct shmem_sb_info to store the\noptional mode flags.\n\n[hugh@veritas.com: shmem mpol: fix build warning]\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Hugh Dickins \u003chugh@veritas.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a3b51e0142d1be156ac697eaadadd6cfbb7ba32b",
      "tree": "da9e527316f112c09f0bc16553c64e3c87125c82",
      "parents": [
        "1b27d05b6e21249d2338be26dfcbe8f8d8ff8a5b"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Mon Apr 28 02:12:23 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:19 2008 -0700"
      },
      "message": "mempolicy: convert MPOL constants to enum\n\nThe mempolicy mode constants, MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, and\nMPOL_INTERLEAVE, are better declared as part of an enum since they are\nsequentially numbered and cannot be combined.\n\nThe policy member of struct mempolicy is also converted from type short to\ntype unsigned short.  A negative policy does not have any legitimate meaning,\nso it is possible to change its type in preparation for adding optional mode\nflags later.\n\nThe equivalent member of struct shmem_sb_info is also changed from int to\nunsigned short.\n\nFor compatibility, the policy formal to get_mempolicy() remains as a pointer\nto an int:\n\n\tint get_mempolicy(int *policy, unsigned long *nmask,\n\t\t\t  unsigned long maxnode, unsigned long addr,\n\t\t\t  unsigned long flags);\n\nalthough the only possible values is the range of type unsigned short.\n\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "19770b32609b6bf97a3dece2529089494cbfc549",
      "tree": "3b5922d1b20aabdf929bde9309f323841717747a",
      "parents": [
        "dd1a239f6f2d4d3eedd318583ec319aa145b324c"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Apr 28 02:12:18 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:19 2008 -0700"
      },
      "message": "mm: filter based on a nodemask as well as a gfp_mask\n\nThe MPOL_BIND policy creates a zonelist that is used for allocations\ncontrolled by that mempolicy.  As the per-node zonelist is already being\nfiltered based on a zone id, this patch adds a version of __alloc_pages() that\ntakes a nodemask for further filtering.  This eliminates the need for\nMPOL_BIND to create a custom zonelist.\n\nA positive benefit of this is that allocations using MPOL_BIND now use the\nlocal node\u0027s distance-ordered zonelist instead of a custom node-id-ordered\nzonelist.  I.e., pages will be allocated from the closest allowed node with\navailable memory.\n\n[Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]\n[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]\n[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "dd1a239f6f2d4d3eedd318583ec319aa145b324c",
      "tree": "aff4224c96b5e2e67588c3946858a724863eeaf9",
      "parents": [
        "54a6eb5c4765aa573a030ceeba2c14e3d2ea5706"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Apr 28 02:12:17 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:18 2008 -0700"
      },
      "message": "mm: have zonelist contains structs with both a zone pointer and zone_idx\n\nFiltering zonelists requires very frequent use of zone_idx().  This is costly\nas it involves a lookup of another structure and a substraction operation.  As\nthe zone_idx is often required, it should be quickly accessible.  The node idx\ncould also be stored here if it was found that accessing zone-\u003enode is\nsignificant which may be the case on workloads where nodemasks are heavily\nused.\n\nThis patch introduces a struct zoneref to store a zone pointer and a zone\nindex.  The zonelist then consists of an array of these struct zonerefs which\nare looked up as necessary.  Helpers are given for accessing the zone index as\nwell as the node index.\n\n[kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]\n[hugh@veritas.com: mm-have-zonelist: fix memcg ooms]\n[hugh@veritas.com: just return do_try_to_free_pages]\n[hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Hugh Dickins \u003chugh@veritas.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0e88460da6ab7bb6a7ef83675412ed5b6315d741",
      "tree": "1feb4de2362e4998a0deeab66af1efb9c7b8bb34",
      "parents": [
        "dac1d27bc8d5ca636d3014ecfdf94407031d1970"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Apr 28 02:12:14 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:18 2008 -0700"
      },
      "message": "mm: introduce node_zonelist() for accessing the zonelist for a GFP mask\n\nIntroduce a node_zonelist() helper function.  It is used to lookup the\nappropriate zonelist given a node and a GFP mask.  The patch on its own is a\ncleanup but it helps clarify parts of the two-zonelist-per-node patchset.  If\nnecessary, it can be merged with the next patch in this set without problems.\n\nReviewed-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "69682d852f5c94ee94e21174b3e8b719626c98db",
      "tree": "ed3c7c0681ef7658132ae32f1cd8a56bb6451621",
      "parents": [
        "9afa802ff568d935dab33dd207dc25d9849f35d4"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "Lee.Schermerhorn@hp.com",
        "time": "Mon Mar 10 11:43:45 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Mon Mar 10 18:01:19 2008 -0700"
      },
      "message": "mempolicy: fix reference counting bugs\n\nAddress 3 known bugs in the current memory policy reference counting method.\nI have a series of patches to rework the reference counting to reduce overhead\nin the allocation path.  However, that series will require testing in -mm once\nI repost it.\n\n1) alloc_page_vma() does not release the extra reference taken for\n   vma/shared mempolicy when the mode \u003d\u003d MPOL_INTERLEAVE.  This can result in\n   leaking mempolicy structures.  This is probably occurring, but not being\n   noticed.\n\n   Fix:  add the conditional release of the reference.\n\n2) hugezonelist unconditionally releases a reference on the mempolicy when\n   mode \u003d\u003d MPOL_INTERLEAVE.  This can result in decrementing the reference\n   count for system default policy [should have no ill effect] or premature\n   freeing of task policy.  If this occurred, the next allocation using task\n   mempolicy would use the freed structure and probably BUG out.\n\n   Fix:  add the necessary check to the release.\n\n3) The current reference counting method assumes that vma \u0027get_policy()\u0027\n   methods automatically add an extra reference a non-NULL returned mempolicy.\n    This is true for shmem_get_policy() used by tmpfs mappings, including\n   regular page shm segments.  However, SHM_HUGETLB shm\u0027s, backed by\n   hugetlbfs, just use the vma policy without the extra reference.  This\n   results in freeing of the vma policy on the first allocation, with reuse of\n   the freed mempolicy structure on subsequent allocations.\n\n   Fix: Rather than add another condition to the conditional reference\n   release, which occur in the allocation path, just add a reference when\n   returning the vma policy in shm_get_policy() to match the assumptions.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Greg KH \u003cgreg@kroah.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: \u003ceric.whitney@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "c32c2f63a9d6c953aaf168c0b2551da9734f76d2",
      "tree": "14eca3083f3de4a87a95359ab66109c10add1ae7",
      "parents": [
        "e83aece3afad4d56cc01abe069d3519e851cd2de"
      ],
      "author": {
        "name": "Jan Blunck",
        "email": "jblunck@suse.de",
        "time": "Thu Feb 14 19:38:43 2008 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Thu Feb 14 21:17:08 2008 -0800"
      },
      "message": "d_path: Make seq_path() use a struct path argument\n\nseq_path() is always called with a dentry and a vfsmount from a struct path.\nMake seq_path() take it directly as an argument.\n\nSigned-off-by: Jan Blunck \u003cjblunck@suse.de\u003e\nCc: Christoph Hellwig \u003chch@lst.de\u003e\nCc: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\nCc: \"J. Bruce Fields\" \u003cbfields@fieldses.org\u003e\nCc: Neil Brown \u003cneilb@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "31f1de46b90ad360a16e7af3e277d104961df923",
      "tree": "a54e8698d4e4d088c4008e0ae91b579b13d2c208",
      "parents": [
        "1a510089849ff9f70b654659bf976a6baf3a4833"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Feb 12 13:30:22 2008 +0900"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Mon Feb 11 20:48:29 2008 -0800"
      },
      "message": "mempolicy: silently restrict nodemask to allowed nodes\n\nKosaki Motohito noted that \"numactl --interleave\u003dall ...\" failed in the\npresence of memoryless nodes.  This patch attempts to fix that problem.\n\nSome background:\n\nnumactl --interleave\u003dall calls set_mempolicy(2) with a fully populated\n[out to MAXNUMNODES] nodemask.  set_mempolicy() [in do_set_mempolicy()]\ncalls contextualize_policy() which requires that the nodemask be a\nsubset of the current task\u0027s mems_allowed; else EINVAL will be returned.\n\nA task\u0027s mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]\ni.e., nodes with memory.  So, a fully populated nodemask will be\ndeclared invalid if it includes memoryless nodes.\n\n  NOTE:  the same thing will occur when running in a cpuset\n         with restricted mem_allowed--for the same reason:\n         node mask contains dis-allowed nodes.\n\nmbind(2), on the other hand, just masks off any nodes in the nodemask\nthat are not included in the caller\u0027s mems_allowed.\n\nIn each case [mbind() and set_mempolicy()], mpol_check_policy() will\ncomplain [again, resulting in EINVAL] if the nodemask contains any\nmemoryless nodes.  This is somewhat redundant as mpol_new() will remove\nmemoryless nodes for interleave policy, as will bind_zonelist()--called\nby mpol_new() for BIND policy.\n\nProposed fix:\n\n1) modify contextualize_policy logic to:\n   a) remember whether the incoming node mask is empty.\n   b) if not, restrict the nodemask to allowed nodes, as is\n      currently done in-line for mbind().  This guarantees\n      that the resulting mask includes only nodes with memory.\n\n      NOTE:  this is a [benign, IMO] change in behavior for\n             set_mempolicy().  Dis-allowed nodes will be\n             silently ignored, rather than returning an error.\n\n   c) fold this code into mpol_check_policy(), replace 2 calls to\n      contextualize_policy() to call mpol_check_policy() directly\n      and remove contextualize_policy().\n\n2) In existing mpol_check_policy() logic, after \"contextualization\":\n   a) MPOL_DEFAULT:  require that in coming mask \"was_empty\"\n   b) MPOL_{BIND|INTERLEAVE}:  require that contextualized nodemask\n      contains at least one node.\n   c) add a case for MPOL_PREFERRED:  if in coming was not empty\n      and resulting mask IS empty, user specified invalid nodes.\n      Return EINVAL.\n   c) remove the now redundant check for memoryless nodes\n\n3) remove the now redundant masking of policy nodes for interleave\n   policy from mpol_new().\n\n4) Now that mpol_check_policy() contextualizes the nodemask, remove\n   the in-line nodes_and() from sys_mbind().  I believe that this\n   restores mbind() to the behavior before the memoryless-nodes\n   patch series.  E.g., we\u0027ll no longer treat an invalid nodemask\n   with MPOL_PREFERRED as local allocation.\n\n[ Patch history:\n\n  v1 -\u003e v2:\n   - Communicate whether or not incoming node mask was empty to\n     mpol_check_policy() for better error checking.\n   - As suggested by David Rientjes, remove the now unused\n     cpuset_nodes_subset_current_mems_allowed() from cpuset.h\n\n  v2 -\u003e v3:\n   - As suggested by Kosaki Motohito, fold the \"contextualization\"\n     of policy nodemask into mpol_check_policy().  Looks a little\n     cleaner. ]\n\nSigned-off-by:  Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by:  KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nTested-by:      KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by:       David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3ad33b2436b545cbe8b28e53f3710432cad457ab",
      "tree": "581808f90a08838ee27d76cc24812c7093c216a9",
      "parents": [
        "e1a1c997afe907e6ec4799e4be0f38cffd8b418c"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "Lee.Schermerhorn@hp.com",
        "time": "Wed Nov 14 16:59:10 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed Nov 14 18:45:38 2007 -0800"
      },
      "message": "Migration: find correct vma in new_vma_page()\n\nWe hit the BUG_ON() in mm/rmap.c:vma_address() when trying to migrate via\nmbind(MPOL_MF_MOVE) a non-anon region that spans multiple vmas.  For\nanon-regions, we just fail to migrate any pages beyond the 1st vma in the\nrange.\n\nThis occurs because do_mbind() collects a list of pages to migrate by\ncalling check_range().  check_range() walks the task\u0027s mm, spanning vmas as\nnecessary, to collect the migratable pages into a list.  Then, do_mbind()\ncalls migrate_pages() passing the list of pages, a function to allocate new\npages based on vma policy [new_vma_page()], and a pointer to the first vma\nof the range.\n\nFor each page in the list, new_vma_page() calls page_address_in_vma()\npassing the page and the vma [first in range] to obtain the address to get\nfor alloc_page_vma().  The page address is needed to get interleaving\npolicy correct.  If the pages in the list come from multiple vmas,\neventually, new_page_address() will pass that page to page_address_in_vma()\nwith the incorrect vma.  For !PageAnon pages, this will result in a bug\ncheck in rmap.c:vma_address().  For anon pages, vma_address() will just\nreturn EFAULT and fail the migration.\n\nThis patch modifies new_vma_page() to check the return value from\npage_address_in_vma().  If the return value is EFAULT, new_vma_page()\nsearchs forward via vm_next for the vma that maps the page--i.e., that does\nnot return EFAULT.  This assumes that the pages in the list handed to\nmigrate_pages() is in address order.  This is currently case.  The patch\ndocuments this assumption in a new comment block for new_vma_page().\n\nIf new_vma_page() cannot locate the vma mapping the page in a forward\nsearch in the mm, it will pass a NULL vma to alloc_page_vma().  This will\nresult in the allocation using the task policy, if any, else system default\npolicy.  This situation is unlikely, but the patch documents this behavior\nwith a comment.\n\nNote, this patch results in restarting from the first vma in a multi-vma\nrange each time new_vma_page() is called.  If this is not acceptable, we\ncan make the vma argument a pointer, both in new_vma_page() and it\u0027s caller\nunmap_and_move() so that the value held by the loop in migrate_pages()\nalways passes down the last vma in which a page was found.  This will\nrequire changes to all new_page_t functions passed to migrate_pages().  Is\nthis necessary?\n\nFor this patch to work, we can\u0027t bug check in vma_address() for pages\noutside the argument vma.  This patch removes the BUG_ON().  All other\ncallers [besides new_vma_page()] already check the return status.\n\nTested on x86_64, 4 node NUMA platform.\n\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "228ebcbe634a30aec35132ea4375721bcc41bec0",
      "tree": "a875976fd5bde6e2f931aa235c34c88a2738493f",
      "parents": [
        "b488893a390edfe027bae7a46e9af8083e740668"
      ],
      "author": {
        "name": "Pavel Emelyanov",
        "email": "xemul@openvz.org",
        "time": "Thu Oct 18 23:40:16 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Fri Oct 19 11:53:40 2007 -0700"
      },
      "message": "Uninline find_task_by_xxx set of functions\n\nThe find_task_by_something is a set of macros are used to find task by pid\ndepending on what kind of pid is proposed - global or virtual one.  All of\nthem are wrappers above the most generic one - find_task_by_pid_type_ns() -\nand just substitute some args for it.\n\nIt turned out, that dereferencing the current-\u003ensproxy-\u003epid_ns construction\nand pushing one more argument on the stack inline cause kernel text size to\ngrow.\n\nThis patch moves all this stuff out-of-line into kernel/pid.c.  Together\nwith the next patch it saves a bit less than 400 bytes from the .text\nsection.\n\nSigned-off-by: Pavel Emelyanov \u003cxemul@openvz.org\u003e\nCc: Sukadev Bhattiprolu \u003csukadev@us.ibm.com\u003e\nCc: Oleg Nesterov \u003coleg@tv-sign.ru\u003e\nCc: Paul Menage \u003cmenage@google.com\u003e\nCc: \"Eric W. Biederman\" \u003cebiederm@xmission.com\u003e\nAcked-by: Ingo Molnar \u003cmingo@elte.hu\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b488893a390edfe027bae7a46e9af8083e740668",
      "tree": "c469a7f99ad01005a73011c029eb5e5d15454559",
      "parents": [
        "3eb07c8c8adb6f0572baba844ba2d9e501654316"
      ],
      "author": {
        "name": "Pavel Emelyanov",
        "email": "xemul@openvz.org",
        "time": "Thu Oct 18 23:40:14 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Fri Oct 19 11:53:40 2007 -0700"
      },
      "message": "pid namespaces: changes to show virtual ids to user\n\nThis is the largest patch in the set. Make all (I hope) the places where\nthe pid is shown to or get from user operate on the virtual pids.\n\nThe idea is:\n - all in-kernel data structures must store either struct pid itself\n   or the pid\u0027s global nr, obtained with pid_nr() call;\n - when seeking the task from kernel code with the stored id one\n   should use find_task_by_pid() call that works with global pids;\n - when showing pid\u0027s numerical value to the user the virtual one\n   should be used, but however when one shows task\u0027s pid outside this\n   task\u0027s namespace the global one is to be used;\n - when getting the pid from userspace one need to consider this as\n   the virtual one and use appropriate task/pid-searching functions.\n\n[akpm@linux-foundation.org: build fix]\n[akpm@linux-foundation.org: nuther build fix]\n[akpm@linux-foundation.org: yet nuther build fix]\n[akpm@linux-foundation.org: remove unneeded casts]\nSigned-off-by: Pavel Emelyanov \u003cxemul@openvz.org\u003e\nSigned-off-by: Alexey Dobriyan \u003cadobriyan@openvz.org\u003e\nCc: Sukadev Bhattiprolu \u003csukadev@us.ibm.com\u003e\nCc: Oleg Nesterov \u003coleg@tv-sign.ru\u003e\nCc: Paul Menage \u003cmenage@google.com\u003e\nCc: \"Eric W. Biederman\" \u003cebiederm@xmission.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "8793d854edbc2774943a4b0de3304dc73991159a",
      "tree": "380b3403a0fedfcce61d9af5af1ffbcc71017abf",
      "parents": [
        "81a6a5cdd2c5cd70874b88afe524ab09e9e869af"
      ],
      "author": {
        "name": "Paul Menage",
        "email": "menage@google.com",
        "time": "Thu Oct 18 23:39:39 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Fri Oct 19 11:53:36 2007 -0700"
      },
      "message": "Task Control Groups: make cpusets a client of cgroups\n\nRemove the filesystem support logic from the cpusets system and makes cpusets\na cgroup subsystem\n\nThe \"cpuset\" filesystem becomes a dummy filesystem; attempts to mount it get\npassed through to the cgroup filesystem with the appropriate options to\nemulate the old cpuset filesystem behaviour.\n\nSigned-off-by: Paul Menage \u003cmenage@google.com\u003e\nCc: Serge E. Hallyn \u003cserue@us.ibm.com\u003e\nCc: \"Eric W. Biederman\" \u003cebiederm@xmission.com\u003e\nCc: Dave Hansen \u003chaveblue@us.ibm.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Kirill Korotaev \u003cdev@openvz.org\u003e\nCc: Herbert Poetzl \u003cherbert@13thfloor.at\u003e\nCc: Srivatsa Vaddagiri \u003cvatsa@in.ibm.com\u003e\nCc: Cedric Le Goater \u003cclg@fr.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "dbcb0f19c877df9026b8c1227758d38bd561e9c4",
      "tree": "f58c85976906f42ff44798f514177392d7c48d0f",
      "parents": [
        "d8dc74f212c38407fc9f4367181f8f969b719485"
      ],
      "author": {
        "name": "Adrian Bunk",
        "email": "bunk@stusta.de",
        "time": "Tue Oct 16 01:26:26 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:03 2007 -0700"
      },
      "message": "mm/mempolicy.c: cleanups\n\nThis patch contains the following cleanups:\n- every file should include the headers containing the prototypes for\n  its global functions\n- make the follosing needlessly global functions static:\n  - migrate_to_node()\n  - do_mbind()\n  - sp_alloc()\n  - mpol_rebind_policy()\n\n[akpm@linux-foundation.org: fix uninitialised var warning]\nSigned-off-by: Adrian Bunk \u003cbunk@stusta.de\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "37b07e4163f7306aa735a6e250e8d22293e5b8de",
      "tree": "5c9c1935253a39aa840a9923bf1c86620cb6f733",
      "parents": [
        "0e1e7c7a739562a321fda07c7cd2a97a7114f8f8"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "Lee.Schermerhorn@hp.com",
        "time": "Tue Oct 16 01:25:39 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:59 2007 -0700"
      },
      "message": "memoryless nodes: fixup uses of node_online_map in generic code\n\nHere\u0027s a cut at fixing up uses of the online node map in generic code.\n\nmm/shmem.c:shmem_parse_mpol()\n\n\tEnsure nodelist is subset of nodes with memory.\n\tUse node_states[N_HIGH_MEMORY] as default for missing\n\tnodelist for interleave policy.\n\nmm/shmem.c:shmem_fill_super()\n\n\tinitialize policy_nodes to node_states[N_HIGH_MEMORY]\n\nmm/page-writeback.c:highmem_dirtyable_memory()\n\n\tsum over nodes with memory\n\nmm/page_alloc.c:zlc_setup()\n\n\tallowednodes - use nodes with memory.\n\nmm/page_alloc.c:default_zonelist_order()\n\n\taverage over nodes with memory.\n\nmm/page_alloc.c:find_next_best_node()\n\n\tskip nodes w/o memory.\n\tN_HIGH_MEMORY state mask may not be initialized at this time,\n\tunless we want to depend on early_calculate_totalpages() [see\n\tbelow].  Will ZONE_MOVABLE ever be configurable?\n\nmm/page_alloc.c:find_zone_movable_pfns_for_nodes()\n\n\tspread kernelcore over nodes with memory.\n\n\tThis required calling early_calculate_totalpages()\n\tunconditionally, and populating N_HIGH_MEMORY node\n\tstate therein from nodes in the early_node_map[].\n\tIf we can depend on this, we can eliminate the\n\tpopulation of N_HIGH_MEMORY mask from __build_all_zonelists()\n\tand use the N_HIGH_MEMORY mask in find_next_best_node().\n\nmm/mempolicy.c:mpol_check_policy()\n\n\tEnsure nodes specified for policy are subset of\n\tnodes with memory.\n\n[akpm@linux-foundation.org: fix warnings]\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "56bbd65df0e92a4a8eb70c5f2b416ae2b6c5fb31",
      "tree": "714154b7b16d2e08c60d49b925aa0e789f0f0be0",
      "parents": [
        "4199cfa02b982f4c739e8a6a304d6a40e1935d25"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Tue Oct 16 01:25:35 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:58 2007 -0700"
      },
      "message": "Memoryless nodes: Update memory policy and page migration\n\nOnline nodes now may have no memory.  The checks and initialization must\ntherefore be changed to no longer use the online functions.\n\nThis will correctly initialize the interleave on bootup to only target nodes\nwith memory and will make sys_move_pages return an error when a page is to be\nmoved to a memoryless node.  Similarly we will get an error if MPOL_BIND and\nMPOL_INTERLEAVE is used on a memoryless node.\n\nThese are somewhat new semantics.  So far one could specify memoryless nodes\nand we would maybe do the right thing and just ignore the node (or we\u0027d do\nsomething strange like with MPOL_INTERLEAVE).  If we want to allow the\nspecification of memoryless nodes via memory policies then we need to keep\nchecking for online nodes.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: Nishanth Aravamudan \u003cnacc@us.ibm.com\u003e\nTested-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Bob Picco \u003cbob.picco@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@skynet.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "6eaf806a223e61dc5f2de4ab591f11beb97a8f3b",
      "tree": "fd63b422c3cb357792d12f9170c354e25b94daa0",
      "parents": [
        "7ea1530ab3fdfa85441061909cc8040e84776fd4"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Tue Oct 16 01:25:30 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:58 2007 -0700"
      },
      "message": "Memoryless nodes: Fix interleave behavior for memoryless nodes\n\nMPOL_INTERLEAVE currently simply loops over all nodes.  Allocations on\nmemoryless nodes will be redirected to nodes with memory.  This results in an\nimbalance because the neighboring nodes to memoryless nodes will get\nsignificantly more interleave hits that the rest of the nodes on the system.\n\nWe can avoid this imbalance by clearing the nodes in the interleave node set\nthat have no memory.  If we use the node map of the memory nodes instead of\nthe online nodes then we have only the nodes we want.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Nishanth Aravamudan \u003cnacc@us.ibm.com\u003e\nTested-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Bob Picco \u003cbob.picco@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@skynet.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "754af6f5a85fcd1ecb456851d20c65e4c6ce10ab",
      "tree": "8c985bfd704a8c993d6ca992725969c6fc5c9e5a",
      "parents": [
        "32a4330d4156e55a4888a201f484dbafed9504ed"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "Lee.Schermerhorn@hp.com",
        "time": "Tue Oct 16 01:24:51 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:54 2007 -0700"
      },
      "message": "Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flag\n\nAllow an application to query the memories allowed by its context.\n\nUpdated numa_memory_policy.txt to mention that applications can use this to\nobtain allowed memories for constructing valid policies.\n\nTODO:  update out-of-tree libnuma wrapper[s], or maybe add a new\nwrapper--e.g.,  numa_get_mems_allowed() ?\n\nAlso, update numa syscall man pages.\n\nTested with memtoy V\u003e\u003d0.13.\n\nSigned-off-by:  Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "43fac94dd62667c83dd2daa5b7ac548512af780a",
      "tree": "68b8cf73959afd24410f3f398bda5953c7dcbadd",
      "parents": [
        "39e91e433169bdfd5a312654e5988986662afd7f"
      ],
      "author": {
        "name": "Jesper Juhl",
        "email": "jesper.juhl@gmail.com",
        "time": "Tue Oct 16 01:24:30 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:52 2007 -0700"
      },
      "message": "Clean up duplicate includes in mm/\n\nThis patch cleans up duplicate includes in\n\tmm/\n\nSigned-off-by: Jesper Juhl \u003cjesper.juhl@gmail.com\u003e\nAcked-by: Paul Mundt \u003clethal@linux-sh.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "480eccf9ae1073b87bb4fe118971fbf134a5bc61",
      "tree": "b66cd85cd6ad9dc7c141d34837a848111d036584",
      "parents": [
        "28f300d23674fa01ae747c66ce861d4ee6aebe8c"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "Lee.Schermerhorn@hp.com",
        "time": "Tue Sep 18 22:46:47 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed Sep 19 11:24:18 2007 -0700"
      },
      "message": "Fix NUMA Memory Policy Reference Counting\n\nThis patch proposes fixes to the reference counting of memory policy in the\npage allocation paths and in show_numa_map().  Extracted from my \"Memory\nPolicy Cleanups and Enhancements\" series as stand-alone.\n\nShared policy lookup [shmem] has always added a reference to the policy,\nbut this was never unrefed after page allocation or after formatting the\nnuma map data.\n\nDefault system policy should not require additional ref counting, nor\nshould the current task\u0027s task policy.  However, show_numa_map() calls\nget_vma_policy() to examine what may be [likely is] another task\u0027s policy.\nThe latter case needs protection against freeing of the policy.\n\nThis patch adds a reference count to a mempolicy returned by\nget_vma_policy() when the policy is a vma policy or another task\u0027s\nmempolicy.  Again, shared policy is already reference counted on lookup.  A\nmatching \"unref\" [__mpol_free()] is performed in alloc_page_vma() for\nshared and vma policies, and in show_numa_map() for shared and another\ntask\u0027s mempolicy.  We can call __mpol_free() directly, saving an admittedly\ninexpensive inline NULL test, because we know we have a non-NULL policy.\n\nHandling policy ref counts for hugepages is a bit trickier.\nhuge_zonelist() returns a zone list that might come from a shared or vma\n\u0027BIND policy.  In this case, we should hold the reference until after the\nhuge page allocation in dequeue_hugepage().  The patch modifies\nhuge_zonelist() to return a pointer to the mempolicy if it needs to be\nunref\u0027d after allocation.\n\nKernel Build [16cpu, 32GB, ia64] - average of 10 runs:\n\n\t\tw/o patch\tw/ refcount patch\n\t    Avg\t  Std Devn\t   Avg\t  Std Devn\nReal:\t 100.59\t    0.38\t 100.63\t    0.43\nUser:\t1209.60\t    0.37\t1209.91\t    0.31\nSystem:   81.52\t    0.42\t  81.64\t    0.34\n\nSigned-off-by:  Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Andi Kleen \u003cak@suse.de\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3b42d28b2a04b3c9830eb865288239d45eccc402",
      "tree": "776a297ef8bd8a879da74290543907014abe6198",
      "parents": [
        "dec4ad86c2fbea062e9ef9caa6d6e79f7c5e0b12"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Fri Aug 31 00:12:08 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Fri Aug 31 01:42:23 2007 -0700"
      },
      "message": "Page migration: Do not accept invalid nodes in the target nodeset\n\nPage migration currently does not check if the target of the move contains\nnodes that that are invalid (if root attempts to migrate pages)\nand may try to allocate from invalid nodes if these are specified\nleading to oopses.\n\nReturn -EINVAL if an offline node is specified.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b377fd3982ad957c796758a90e2988401a884241",
      "tree": "3d7449ccdf7038bffffa9323873f4095cc1ac6ce",
      "parents": [
        "8e92f21ba3ea3f54e4be062b87ef9fc4af2d33e2"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Wed Aug 22 14:02:05 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed Aug 22 19:52:47 2007 -0700"
      },
      "message": "Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE\n\nThe NUMA layer only supports NUMA policies for the highest zone.  When\nZONE_MOVABLE is configured with kernelcore\u003d, the the highest zone becomes\nZONE_MOVABLE.  The result is that policies are only applied to allocations\nlike anonymous pages and page cache allocated from ZONE_MOVABLE when the\nzone is used.\n\nThis patch applies policies to the two highest zones when the highest zone\nis ZONE_MOVABLE.  As ZONE_MOVABLE consists of pages from the highest \"real\"\nzone, it\u0027s always functionally equivalent.\n\nThe patch has been tested on a variety of machines both NUMA and non-NUMA\ncovering x86, x86_64 and ppc64.  No abnormal results were seen in\nkernbench, tbench, dbench or hackbench.  It passes regression tests from\nthe numactl package with and without kernelcore\u003d once numactl tests are\npatched to wait for vmstat counters to update.\n\nakpm: this is the nasty hack to fix NUMA mempolicies in the presence of\nZONE_MOVABLE and kernelcore\u003d in 2.6.23.  Christoph says \"For .24 either merge\nthe mobility or get the other solution that Mel is working on.  That solution\nwould only use a single zonelist per node and filter on the fly.  That may\nhelp performance and also help to make memory policies work better.\"\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by:  Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nTested-by:  Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "20c2df83d25c6a95affe6157a4c9cac4cf5ffaac",
      "tree": "415c4453d2b17a50abe7a3e515177e1fa337bd67",
      "parents": [
        "64fb98fc40738ae1a98bcea9ca3145b89fb71524"
      ],
      "author": {
        "name": "Paul Mundt",
        "email": "lethal@linux-sh.org",
        "time": "Fri Jul 20 10:11:58 2007 +0900"
      },
      "committer": {
        "name": "Paul Mundt",
        "email": "lethal@linux-sh.org",
        "time": "Fri Jul 20 10:11:58 2007 +0900"
      },
      "message": "mm: Remove slab destructors from kmem_cache_create().\n\nSlab destructors were no longer supported after Christoph\u0027s\nc59def9f222d44bb7e2f0a559f2906191a0862d7 change. They\u0027ve been\nBUGs for both slab and slub, and slob never supported them\neither.\n\nThis rips out support for the dtor pointer from kmem_cache_create()\ncompletely and fixes up every single callsite in the kernel (there were\nabout 224, not including the slab allocator definitions themselves,\nor the documentation references).\n\nSigned-off-by: Paul Mundt \u003clethal@linux-sh.org\u003e\n"
    },
    {
      "commit": "396faf0303d273219db5d7eb4a2879ad977ed185",
      "tree": "96cb64fd6713ef7a924f4f878e259aea781f079a",
      "parents": [
        "2a1e274acf0b1c192face19a4be7c12d4503eaaf"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Jul 17 04:03:13 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Jul 17 10:22:59 2007 -0700"
      },
      "message": "Allow huge page allocations to use GFP_HIGH_MOVABLE\n\nHuge pages are not movable so are not allocated from ZONE_MOVABLE.  However,\nas ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it\ncan be used to satisfy hugepage allocations even when the system has been\nrunning a long time.  This allows an administrator to resize the hugepage pool\nat runtime depending on the size of ZONE_MOVABLE.\n\nThis patch adds a new sysctl called hugepages_treat_as_movable.  When a\nnon-zero value is written to it, future allocations for the huge page pool\nwill use ZONE_MOVABLE.  Despite huge pages being non-movable, we do not\nintroduce additional external fragmentation of note as huge pages are always\nthe largest contiguous block we care about.\n\n[akpm@linux-foundation.org: various fixes]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "769848c03895b63e5662eb7e4ec8c4866f7d0183",
      "tree": "8911c7c312c8b8b172795fa2874c8162e1d3d15a",
      "parents": [
        "a32ea1e1f925399e0d81ca3f7394a44a6dafa12c"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Jul 17 04:03:05 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Jul 17 10:22:59 2007 -0700"
      },
      "message": "Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated\n\nIt is often known at allocation time whether a page may be migrated or not.\nThis patch adds a flag called __GFP_MOVABLE and a new mask called\nGFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated\nusing the page migration mechanism or reclaimed by syncing with backing\nstorage and discarding.\n\nAn API function very similar to alloc_zeroed_user_highpage() is added for\n__GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The\nflags used by alloc_zeroed_user_highpage() are not changed because it would\nchange the semantics of an existing API.  After this patch is applied there\nare no in-kernel users of alloc_zeroed_user_highpage() so it probably should\nbe marked deprecated if this patch is merged.\n\nNote that this patch includes a minor cleanup to the use of __GFP_ZERO in\nshmem.c to keep all flag modifications to inode-\u003emapping in the\nshmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of\nHugh Dickens.\n\nAdditional credit goes to Christoph Lameter and Linus Torvalds for shaping the\nconcept.  Credit to Hugh Dickens for catching issues with shmem swap vector\nand ramfs allocations.\n\n[akpm@linux-foundation.org: build fix]\n[hugh@veritas.com: __GFP_ZERO cleanup]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "140d5a49046b6d73dce4a4229e88c000a99ee126",
      "tree": "64dcfe28ef6f5ccee5a2722e5c332c66ded29e28",
      "parents": [
        "462e00cc7151ed91fba688594436c453c80efb5d"
      ],
      "author": {
        "name": "Paul Mundt",
        "email": "lethal@linux-sh.org",
        "time": "Sun Jul 15 23:38:16 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Mon Jul 16 09:05:36 2007 -0700"
      },
      "message": "numa: mempolicy: trivial debug fixes.\n\nEnabling debugging fails to build due to the nodemask variable in\ndo_mbind() having changed names, and then oopses on boot due to the\nassumption that the nodemask can be dereferenced -- which doesn\u0027t work out\nso well when the policy is changed to MPOL_DEFAULT with a NULL nodemask by\nnuma_default_policy().\n\nThis fixes it up, and switches from PDprintk() to pr_debug() while\nwe\u0027re at it.\n\nSigned-off-by: Paul Mundt \u003clethal@linux-sh.org\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b71636e29823c0602d908a2a62e94c9b57a97491",
      "tree": "d69a20eec2cc05bcd9b0dac2b9d4ccd9bad39e09",
      "parents": [
        "f0630fff54a239efbbd89faf6a62da071ef1ff78"
      ],
      "author": {
        "name": "Paul Mundt",
        "email": "lethal@linux-sh.org",
        "time": "Sun Jul 15 23:38:15 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Mon Jul 16 09:05:36 2007 -0700"
      },
      "message": "numa: mempolicy: dynamic interleave map for system init\n\nThis converts the default system init memory policy to use a dynamically\ncreated node map instead of defaulting to all online nodes.  Nodes of a\ncertain size (\u003e\u003d 16MB) are judged to be suitable for interleave, and are added\nto the map.  If all nodes are smaller in size, the largest one is\nautomatically selected.\n\nWithout this, tiny nodes find themselves out of memory before we even make it\nto userspace.  Systems with large nodes will notice no change.\n\nOnly the system init policy is effected by this change, the regular\nMPOL_DEFAULT policy is still switched to later on in the boot process as\nnormal.\n\nSigned-off-by: Paul Mundt \u003clethal@linux-sh.org\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0dc952dc3e6d96d554a19fa7bee3f3b1d55e3cff",
      "tree": "1dcf2ff93784716143b7acef816888a71a9504c2",
      "parents": [
        "1f2b69f9bdce8461341e5fb864568a2ee90079c8"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Mon Mar 05 00:30:33 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Mon Mar 05 07:57:51 2007 -0800"
      },
      "message": "[PATCH] Page migration: Fix vma flag checking\n\nCurrently we do not check for vma flags if sys_move_pages is called to move\nindividual pages.  If sys_migrate_pages is called to move pages then we\ncheck for vm_flags that indicate a non migratable vma but that still\nincludes VM_LOCKED and we can migrate mlocked pages.\n\nExtract the vma_migratable check from mm/mempolicy.c, fix it and put it\ninto migrate.h so that is can be used from both locations.\n\nProblem was spotted by Lee Schermerhorn\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "8af5e2eb3cc4450ffba9496c875beac41bf4f4f8",
      "tree": "d15c56e8add1b53d5ed4d5dcb5d1f016d8f2b1c9",
      "parents": [
        "b446b60e4eb5e5457120c4728ada871b1209c1d0"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Feb 20 13:57:49 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Feb 20 17:10:13 2007 -0800"
      },
      "message": "[PATCH] fix mempolicy\u0027s check on a system with memory-less-node\n\nbind_zonelist() can create zero-length zonelist if there is a\nmemory-less-node.  This patch checks the length of zonelist.  If length is\n0, returns -EINVAL.\n\ntested on ia64/NUMA with memory-less-node.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Andi Kleen \u003cak@suse.de\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "6267276f3fdda9ad0d5ca451bdcbdf42b802d64b",
      "tree": "fcc92bd645401a2c0f9e6a35b8e215868969db4d",
      "parents": [
        "65e458d43dff872ee560e721fb0fdb367bb5adb0"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Sat Feb 10 01:43:07 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Sun Feb 11 10:51:18 2007 -0800"
      },
      "message": "[PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone\n\nThis patchset follows up on the earlier work in Andrew\u0027s tree to reduce the\nnumber of zones.  The patches allow to go to a minimum of 2 zones.  This one\nallows also to make ZONE_DMA optional and therefore the number of zones can be\nreduced to one.\n\nZONE_DMA is usually used for ISA DMA devices.  There are a number of reasons\nwhy we would not want to have ZONE_DMA\n\n1. Some arches do not need ZONE_DMA at all.\n\n2. With the advent of IOMMUs DMA zones are no longer needed.\n   The necessity of DMA zones may drastically be reduced\n   in the future. This patchset allows a compilation of\n   a kernel without that overhead.\n\n3. Devices that require ISA DMA get rare these days. All\n   my systems do not have any need for ISA DMA.\n\n4. The presence of an additional zone unecessarily complicates\n   VM operations because it must be scanned and balancing\n   logic must operate on its.\n\n5. With only ZONE_NORMAL one can reach the situation where\n   we have only one zone. This will allow the unrolling of many\n   loops in the VM and allows the optimization of varous\n   code paths in the VM.\n\n6. Having only a single zone in a NUMA system results in a\n   1-1 correspondence between nodes and zones. Various additional\n   optimizations to critical VM paths become possible.\n\nMany systems today can operate just fine with a single zone.  If you look at\nwhat is in ZONE_DMA then one usually sees that nothing uses it.  The DMA slabs\nare empty (Some arches use ZONE_DMA instead of ZONE_NORMAL, then ZONE_NORMAL\nwill be empty instead).\n\nOn all of my systems (i386, x86_64, ia64) ZONE_DMA is completely empty.  Why\nconstantly look at an empty zone in /proc/zoneinfo and empty slab in\n/proc/slabinfo?  Non i386 also frequently have no need for ZONE_DMA and zones\nstay empty.\n\nThe patchset was tested on i386 (UP / SMP), x86_64 (UP, NUMA) and ia64 (NUMA).\n\nThe RFC posted earlier (see\nhttp://marc.theaimsgroup.com/?l\u003dlinux-kernel\u0026m\u003d115231723513008\u0026w\u003d2) had lots\nof #ifdefs in them.  An effort has been made to minize the number of #ifdefs\nand make this as compact as possible.  The job was made much easier by the\nongoing efforts of others to extract common arch specific functionality.\n\nI have been running this for awhile now on my desktop and finally Linux is\nusing all my available RAM instead of leaving the 16MB in ZONE_DMA untouched:\n\nchristoph@pentium940:~$ cat /proc/zoneinfo\nNode 0, zone   Normal\n  pages free     4435\n        min      1448\n        low      1810\n        high     2172\n        active   241786\n        inactive 210170\n        scanned  0 (a: 0 i: 0)\n        spanned  524224\n        present  524224\n    nr_anon_pages 61680\n    nr_mapped    14271\n    nr_file_pages 390264\n    nr_slab_reclaimable 27564\n    nr_slab_unreclaimable 1793\n    nr_page_table_pages 449\n    nr_dirty     39\n    nr_writeback 0\n    nr_unstable  0\n    nr_bounce    0\n    cpu: 0 pcp: 0\n              count: 156\n              high:  186\n              batch: 31\n    cpu: 0 pcp: 1\n              count: 9\n              high:  62\n              batch: 15\n  vm stats threshold: 20\n    cpu: 1 pcp: 0\n              count: 177\n              high:  186\n              batch: 31\n    cpu: 1 pcp: 1\n              count: 12\n              high:  62\n              batch: 15\n  vm stats threshold: 20\n  all_unreclaimable: 0\n  prev_priority:     12\n  temp_priority:     12\n  start_pfn:         0\n\nThis patch:\n\nIn two places in the VM we use ZONE_DMA to refer to the first zone.  If\nZONE_DMA is optional then other zones may be first.  So simply replace\nZONE_DMA with zone 0.\n\nThis also fixes ZONETABLE_PGSHIFT.  If we have only a single zone then\nZONES_PGSHIFT may become 0 because there is no need anymore to encode the zone\nnumber related to a pgdat.  However, we still need a zonetable to index all\nthe zones for each node if this is a NUMA system.  Therefore define\nZONETABLE_SHIFT unconditionally as the offset of the ZONE field in page flags.\n\n[apw@shadowen.org: fix mismerge]\nAcked-by: Christoph Hellwig \u003chch@infradead.org\u003e\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: \"Luck, Tony\" \u003ctony.luck@intel.com\u003e\nCc: Kyle McMartin \u003ckyle@mcmartin.ca\u003e\nCc: Matthew Wilcox \u003cwilly@debian.org\u003e\nCc: James Bottomley \u003cJames.Bottomley@steeleye.com\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nSigned-off-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "30150f8d7b76f25b1127a5079528b7a17307f995",
      "tree": "3ea69d5dbd7b131fc32bbf416ba9970079bd25c6",
      "parents": [
        "79603a35009ff39562cd5634fa1cf513eb080f27"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Jan 22 20:40:45 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Jan 23 07:52:06 2007 -0800"
      },
      "message": "[PATCH] mbind: restrict nodes to the currently allowed cpuset\n\nCurrently one can specify an arbitrary node mask to mbind that includes\nnodes not allowed.  If that is done with an interleave policy then we will\ngo around all the nodes.  Those outside of the currently allowed cpuset\nwill be redirected to the border nodes.  Interleave will then create\nimbalances at the borders of the cpuset.\n\nThis patch restricts the nodes to the currently allowed cpuset.\n\nThe RFC for this patch was discussed at\nhttp://marc.theaimsgroup.com/?t\u003d116793842100004\u0026r\u003d1\u0026w\u003d2\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e9536ae7205d255bc94616b72910fc6e16c861fe",
      "tree": "4c91e2fbe096958ab8f7476306c4a61c30cbe502",
      "parents": [
        "1b04fe9a8ef10774174897b15d753b9de85fe9e9"
      ],
      "author": {
        "name": "Josef Sipek",
        "email": "jsipek@fsl.cs.sunysb.edu",
        "time": "Fri Dec 08 02:37:21 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Fri Dec 08 08:28:47 2006 -0800"
      },
      "message": "[PATCH] struct path: convert mm\n\nSigned-off-by: Josef Sipek \u003cjsipek@fsl.cs.sunysb.edu\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "15ad7cdcfd76450d4beebc789ec646664238184d",
      "tree": "279d05a76ae0906c23ee2de8c5684d95d9886ad3",
      "parents": [
        "4a08a9f68168e547c2baf100020e9b96cae5fbd1"
      ],
      "author": {
        "name": "Helge Deller",
        "email": "deller@gmx.de",
        "time": "Wed Dec 06 20:40:36 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Dec 07 08:39:46 2006 -0800"
      },
      "message": "[PATCH] struct seq_operations and struct file_operations constification\n\n - move some file_operations structs into the .rodata section\n\n - move static strings from policy_types[] array into the .rodata section\n\n - fix generic seq_operations usages, so that those structs may be defined\n   as \"const\" as well\n\n[akpm@osdl.org: couple of fixes]\nSigned-off-by: Helge Deller \u003cdeller@gmx.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "e94b1766097d53e6f3ccfb36c8baa562ffeda3fc",
      "tree": "93fa0a8ab84976d4e89c50768ca8b8878d642a0d",
      "parents": [
        "54e6ecb23951b195d02433a741c7f7cb0b796c78"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Wed Dec 06 20:33:17 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Dec 07 08:39:24 2006 -0800"
      },
      "message": "[PATCH] slab: remove SLAB_KERNEL\n\nSLAB_KERNEL is an alias of GFP_KERNEL.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "25ba77c141dbcd2602dd0171824d0d72aa023a01",
      "tree": "153eb9bc567f63d739dcaf8a3caf11c8f48b8379",
      "parents": [
        "bc4ba393c007248f76c05945abb7b7b892cdd1cc"
      ],
      "author": {
        "name": "Andy Whitcroft",
        "email": "apw@shadowen.org",
        "time": "Wed Dec 06 20:33:03 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Dec 07 08:39:23 2006 -0800"
      },
      "message": "[PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int\n\nNUMA node ids are passed as either int or unsigned int almost exclusivly\npage_to_nid and zone_to_nid both return unsigned long.  This is a throw\nback to when page_to_nid was a #define and was thus exposing the real type\nof the page flags field.\n\nIn addition to fixing up the definitions of page_to_nid and zone_to_nid I\naudited the users of these functions identifying the following incorrect\nuses:\n\n1) mm/page_alloc.c show_node() -- printk dumping the node id,\n2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison\n   against numa_node_id() which returns an int from cpu_to_node(), and\n3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which\n   uses bit_set which in generic code takes an int.\n\nSigned-off-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nCc: \"Luck, Tony\" \u003ctony.luck@intel.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "9276b1bc96a132f4068fdee00983c532f43d3a26",
      "tree": "04d64444cf6558632cfc7514b5437578b5e616af",
      "parents": [
        "89689ae7f95995723fbcd5c116c47933a3bb8b13"
      ],
      "author": {
        "name": "Paul Jackson",
        "email": "pj@sgi.com",
        "time": "Wed Dec 06 20:31:48 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Dec 07 08:39:20 2006 -0800"
      },
      "message": "[PATCH] memory page_alloc zonelist caching speedup\n\nOptimize the critical zonelist scanning for free pages in the kernel memory\nallocator by caching the zones that were found to be full recently, and\nskipping them.\n\nRemembers the zones in a zonelist that were short of free memory in the\nlast second.  And it stashes a zone-to-node table in the zonelist struct,\nto optimize that conversion (minimize its cache footprint.)\n\nRecent changes:\n\n    This differs in a significant way from a similar patch that I\n    posted a week ago.  Now, instead of having a nodemask_t of\n    recently full nodes, I have a bitmask of recently full zones.\n    This solves a problem that last weeks patch had, which on\n    systems with multiple zones per node (such as DMA zone) would\n    take seeing any of these zones full as meaning that all zones\n    on that node were full.\n\n    Also I changed names - from \"zonelist faster\" to \"zonelist cache\",\n    as that seemed to better convey what we\u0027re doing here - caching\n    some of the key zonelist state (for faster access.)\n\n    See below for some performance benchmark results.  After all that\n    discussion with David on why I didn\u0027t need them, I went and got\n    some ;).  I wanted to verify that I had not hurt the normal case\n    of memory allocation noticeably.  At least for my one little\n    microbenchmark, I found (1) the normal case wasn\u0027t affected, and\n    (2) workloads that forced scanning across multiple nodes for\n    memory improved up to 10% fewer System CPU cycles and lower\n    elapsed clock time (\u0027sys\u0027 and \u0027real\u0027).  Good.  See details, below.\n\n    I didn\u0027t have the logic in get_page_from_freelist() for various\n    full nodes and zone reclaim failures correct.  That should be\n    fixed up now - notice the new goto labels zonelist_scan,\n    this_zone_full, and try_next_zone, in get_page_from_freelist().\n\nThere are two reasons I persued this alternative, over some earlier\nproposals that would have focused on optimizing the fake numa\nemulation case by caching the last useful zone:\n\n 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)\n    have seen real customer loads where the cost to scan the zonelist\n    was a problem, due to many nodes being full of memory before\n    we got to a node we could use.  Or at least, I think we have.\n    This was related to me by another engineer, based on experiences\n    from some time past.  So this is not guaranteed.  Most likely, though.\n\n    The following approach should help such real numa systems just as\n    much as it helps fake numa systems, or any combination thereof.\n\n 2) The effort to distinguish fake from real numa, using node_distance,\n    so that we could cache a fake numa node and optimize choosing\n    it over equivalent distance fake nodes, while continuing to\n    properly scan all real nodes in distance order, was going to\n    require a nasty blob of zonelist and node distance munging.\n\n    The following approach has no new dependency on node distances or\n    zone sorting.\n\nSee comment in the patch below for a description of what it actually does.\n\nTechnical details of note (or controversy):\n\n - See the use of \"zlc_active\" and \"did_zlc_setup\" below, to delay\n   adding any work for this new mechanism until we\u0027ve looked at the\n   first zone in zonelist.  I figured the odds of the first zone\n   having the memory we needed were high enough that we should just\n   look there, first, then get fancy only if we need to keep looking.\n\n - Some odd hackery was needed to add items to struct zonelist, while\n   not tripping up the custom zonelists built by the mm/mempolicy.c\n   code for MPOL_BIND.  My usual wordy comments below explain this.\n   Search for \"MPOL_BIND\".\n\n - Some per-node data in the struct zonelist is now modified frequently,\n   with no locking.  Multiple CPU cores on a node could hit and mangle\n   this data.  The theory is that this is just performance hint data,\n   and the memory allocator will work just fine despite any such mangling.\n   The fields at risk are the struct \u0027zonelist_cache\u0027 fields \u0027fullzones\u0027\n   (a bitmask) and \u0027last_full_zap\u0027 (unsigned long jiffies).  It should\n   all be self correcting after at most a one second delay.\n\n - This still does a linear scan of the same lengths as before.  All\n   I\u0027ve optimized is making the scan faster, not algorithmically\n   shorter.  It is now able to scan a compact array of \u0027unsigned\n   short\u0027 in the case of many full nodes, so one cache line should\n   cover quite a few nodes, rather than each node hitting another\n   one or two new and distinct cache lines.\n\n - If both Andi and Nick don\u0027t find this too complicated, I will be\n   (pleasantly) flabbergasted.\n\n - I removed the comment claiming we only use one cachline\u0027s worth of\n   zonelist.  We seem, at least in the fake numa case, to have put the\n   lie to that claim.\n\n - I pay no attention to the various watermarks and such in this performance\n   hint.  A node could be marked full for one watermark, and then skipped\n   over when searching for a page using a different watermark.  I think\n   that\u0027s actually quite ok, as it will tend to slightly increase the\n   spreading of memory over other nodes, away from a memory stressed node.\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nPerformance - some benchmark results and analysis:\n\nThis benchmark runs a memory hog program that uses multiple\nthreads to touch alot of memory as quickly as it can.\n\nMultiple runs were made, touching 12, 38, 64 or 90 GBytes out of\nthe total 96 GBytes on the system, and using 1, 19, 37, or 55\nthreads (on a 56 CPU system.)  System, user and real (elapsed)\ntimings were recorded for each run, shown in units of seconds,\nin the table below.\n\nTwo kernels were tested - 2.6.18-mm3 and the same kernel with\nthis zonelist caching patch added.  The table also shows the\npercentage improvement the zonelist caching sys time is over\n(lower than) the stock *-mm kernel.\n\n      number     2.6.18-mm3\t   zonelist-cache    delta (\u003c 0 good)\tpercent\n GBs    N  \t------------\t   --------------    ----------------\tsystime\n mem threads   sys user  real\t  sys  user  real     sys  user  real\t better\n  12\t 1     153   24   177\t  151\t 24   176      -2     0    -1\t   1%\n  12\t19\t99   22     8\t   99\t 22\t8\t0     0     0\t   0%\n  12\t37     111   25     6\t  112\t 25\t6\t1     0     0\t  -0%\n  12\t55     115   25     5\t  110\t 23\t5      -5    -2     0\t   4%\n  38\t 1     502   74   576\t  497\t 73   570      -5    -1    -6\t   0%\n  38\t19     426   78    48\t  373\t 76    39     -53    -2    -9\t  12%\n  38\t37     544   83    36\t  547\t 82    36\t3    -1     0\t  -0%\n  38\t55     501   77    23\t  511\t 80    24      10     3     1\t  -1%\n  64\t 1     917  125  1042\t  890\t124  1014     -27    -1   -28\t   2%\n  64\t19    1118  138   119\t  965\t141   103    -153     3   -16\t  13%\n  64\t37    1202  151    94\t 1136\t150    81     -66    -1   -13\t   5%\n  64\t55    1118  141    61\t 1072\t140    58     -46    -1    -3\t   4%\n  90\t 1    1342  177  1519\t 1275\t174  1450     -67    -3   -69\t   4%\n  90\t19    2392  199   192\t 2116\t189   176    -276   -10   -16\t  11%\n  90\t37    3313  238   175\t 2972\t225   145    -341   -13   -30\t  10%\n  90\t55    1948  210   104\t 1843\t213   100    -105     3    -4\t   5%\n\nNotes:\n 1) This test ran a memory hog program that started a specified number N of\n    threads, and had each thread allocate and touch 1/N\u0027th of\n    the total memory to be used in the test run in a single loop,\n    writing a constant word to memory, one store every 4096 bytes.\n    Watching this test during some earlier trial runs, I would see\n    each of these threads sit down on one CPU and stay there, for\n    the remainder of the pass, a different CPU for each thread.\n\n 2) The \u0027real\u0027 column is not comparable to the \u0027sys\u0027 or \u0027user\u0027 columns.\n    The \u0027real\u0027 column is seconds wall clock time elapsed, from beginning\n    to end of that test pass.  The \u0027sys\u0027 and \u0027user\u0027 columns are total\n    CPU seconds spent on that test pass.  For a 19 thread test run,\n    for example, the sum of \u0027sys\u0027 and \u0027user\u0027 could be up to 19 times the\n    number of \u0027real\u0027 elapsed wall clock seconds.\n\n 3) Tests were run on a fresh, single-user boot, to minimize the amount\n    of memory already in use at the start of the test, and to minimize\n    the amount of background activity that might interfere.\n\n 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM.\n\n 5) Notice that the \u0027real\u0027 time gets large for the single thread runs, even\n    though the measured \u0027sys\u0027 and \u0027user\u0027 times are modest.  I\u0027m not sure what\n    that means - probably something to do with it being slow for one thread to\n    be accessing memory along ways away.  Perhaps the fake numa system, running\n    ostensibly the same workload, would not show this substantial degradation\n    of \u0027real\u0027 time for one thread on many nodes -- lets hope not.\n\n 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs)\n    ran quite efficiently, as one might expect.  Each pair of threads needed\n    to allocate and touch the memory on the node the two threads shared, a\n    pleasantly parallizable workload.\n\n 7) The intermediate thread count passes, when asking for alot of memory forcing\n    them to go to a few neighboring nodes, improved the most with this zonelist\n    caching patch.\n\nConclusions:\n * This zonelist cache patch probably makes little difference one way or the\n   other for most workloads on real numa hardware, if those workloads avoid\n   heavy off node allocations.\n * For memory intensive workloads requiring substantial off-node allocations\n   on real numa hardware, this patch improves both kernel and elapsed timings\n   up to ten per-cent.\n * For fake numa systems, I\u0027m optimistic, but will have to leave that up to\n   Rohit Seth to actually test (once I get him a 2.6.18 backport.)\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Rohit Seth \u003crohitseth@google.com\u003e\nCc: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nCc: David Rientjes \u003crientjes@cs.washington.edu\u003e\nCc: Paul Menage \u003cmenage@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "699397499742d1245ea5d677a08fa265df666d2d",
      "tree": "81e9971ff13fcadb6ea8ab12f55278c30bdff1a2",
      "parents": [
        "b16bc64d1aed40fb9cff9187061005b2a89b5d5d"
      ],
      "author": {
        "name": "Keith Owens",
        "email": "kaos@ocs.com.au",
        "time": "Wed Oct 11 01:21:28 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Oct 11 11:14:19 2006 -0700"
      },
      "message": "[PATCH] Fix do_mbind warning with CONFIG_MIGRATION\u003dn\n\nWith CONFIG_MIGRATION\u003dn\n\nmm/mempolicy.c: In function \u0027do_mbind\u0027:\nmm/mempolicy.c:796: warning: passing argument 2 of \u0027migrate_pages\u0027 from incompatible pointer type\n\nSigned-off-by: Keith Owens \u003ckaos@ocs.com.au\u003e\nCc: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "52978be636374c4bfb61220b37fa12f55a071c46",
      "tree": "36444be7bdbc0cdd99d903c0ad87316c93427517",
      "parents": [
        "1a2f67b459bb7846d4a15924face63eb2683acc2"
      ],
      "author": {
        "name": "Alexey Dobriyan",
        "email": "adobriyan@gmail.com",
        "time": "Sat Sep 30 23:27:21 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sun Oct 01 00:39:19 2006 -0700"
      },
      "message": "[PATCH] kmemdup: some users\n\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "765c4507af71c39aba21006bbd3ec809fe9714ff",
      "tree": "8bf1f5f940af830e18321b4e8ceac55457e5b981",
      "parents": [
        "77f700dab4c05f8ee17584ec869672796d7bcb87"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Wed Sep 27 01:50:08 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Sep 27 08:26:12 2006 -0700"
      },
      "message": "[PATCH] GFP_THISNODE for the slab allocator\n\nThis patch insures that the slab node lists in the NUMA case only contain\nslabs that belong to that specific node.  All slab allocations use\nGFP_THISNODE when calling into the page allocator.  If an allocation fails\nthen we fall back in the slab allocator according to the zonelists appropriate\nfor a certain context.\n\nThis allows a replication of the behavior of alloc_pages and alloc_pages node\nin the slab layer.\n\nCurrently allocations requested from the page allocator may be redirected via\ncpusets to other nodes.  This results in remote pages on nodelists and that in\nturn results in interrupt latency issues during cache draining.  Plus the slab\nis handing out memory as local when it is really remote.\n\nFallback for slab memory allocations will occur within the slab allocator and\nnot in the page allocator.  This is necessary in order to be able to use the\nexisting pools of objects on the nodes that we fall back to before adding more\npages to a slab.\n\nThe fallback function insures that the nodes we fall back to obey cpuset\nrestrictions of the current context.  We do not allocate objects from outside\nof the current cpuset context like before.\n\nNote that the implementation of locality constraints within the slab allocator\nrequires importing logic from the page allocator.  This is a mischmash that is\nnot that great.  Other allocators (uncached allocator, vmalloc, huge pages)\nface similar problems and have similar minimal reimplementations of the basic\nfallback logic of the page allocator.  There is another way of implementing a\nslab by avoiding per node lists (see modular slab) but this wont work within\nthe existing slab.\n\nV1-\u003eV2:\n- Use NUMA_BUILD to avoid #ifdef CONFIG_NUMA\n- Exploit GFP_THISNODE being 0 in the NON_NUMA case to avoid another\n  #ifdef\n\n[akpm@osdl.org: build fix]\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "89fa30242facca249aead2aac03c4c69764f911c",
      "tree": "1ac46b4777b819f2a4793d8e37330576ae5089ec",
      "parents": [
        "4415cc8df630b05d3a54267d5f3e5c0b63a4ec05"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Sep 25 23:31:55 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Sep 26 08:48:52 2006 -0700"
      },
      "message": "[PATCH] NUMA: Add zone_to_nid function\n\nThere are many places where we need to determine the node of a zone.\nCurrently we use a difficult to read sequence of pointer dereferencing.\nPut that into an inline function and use throughout VM.  Maybe we can find\na way to optimize the lookup in the future.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "9b819d204cf602eab1a53a9ec4b8d2ca51e02a1d",
      "tree": "9442bf01a00a93a8ae54462fb4878588e1b2a6bf",
      "parents": [
        "056c62418cc639bf2fe962c6a6ee56054b838bc7"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Sep 25 23:31:40 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Sep 26 08:48:50 2006 -0700"
      },
      "message": "[PATCH] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions\n\nAdd a new gfp flag __GFP_THISNODE to avoid fallback to other nodes.  This\nflag is essential if a kernel component requires memory to be located on a\ncertain node.  It will be needed for alloc_pages_node() to force allocation\non the indicated node and for alloc_pages() to force allocation on the\ncurrent node.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "19655d3487001d7df0e10e9cbfc27c758b77c2b5",
      "tree": "8d0aaa216bd32bd64e3a9652fd34d40bdb9d1075",
      "parents": [
        "2f6726e54a9410e2e4cee864947c05e954051916"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Sep 25 23:31:19 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Sep 26 08:48:47 2006 -0700"
      },
      "message": "[PATCH] linearly index zone-\u003enode_zonelists[]\n\nI wonder why we need this bitmask indexing into zone-\u003enode_zonelists[]?\n\nWe always start with the highest zone and then include all lower zones\nif we build zonelists.\n\nAre there really cases where we need allocation from ZONE_DMA or\nZONE_HIGHMEM but not ZONE_NORMAL? It seems that the current implementation\nof highest_zone() makes that already impossible.\n\nIf we go linear on the index then gfp_zone() \u003d\u003d highest_zone() and a lot\nof definitions fall by the wayside.\n\nWe can now revert back to the use of gfp_zone() in mempolicy.c ;-)\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "2f6726e54a9410e2e4cee864947c05e954051916",
      "tree": "91b1173dead0cfc4a25caacb34b6c80f526bbc59",
      "parents": [
        "4e4785bcf0c8503224fa6c17d8e0228de781bff6"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Sep 25 23:31:18 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Sep 26 08:48:47 2006 -0700"
      },
      "message": "[PATCH] Apply type enum zone_type\n\nAfter we have done this we can now do some typing cleanup.\n\nThe memory policy layer keeps a policy_zone that specifies\nthe zone that gets memory policies applied. This variable\ncan now be of type enum zone_type.\n\nThe check_highest_zone function and the build_zonelists funnctionm must\nthen also take a enum zone_type parameter.\n\nPlus there are a number of loops over zones that also should use\nzone_type.\n\nWe run into some troubles at some points with functions that need a\nzone_type variable to become -1. Fix that up.\n\n[pj@sgi.com: fix set_mempolicy() crash]\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "4e4785bcf0c8503224fa6c17d8e0228de781bff6",
      "tree": "002c0a051f7f4de4548ca0a8394b664f64c63627",
      "parents": [
        "b9b15780f808efa2c897f337644ba7a2bec03ecc"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Sep 25 23:31:17 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Sep 26 08:48:47 2006 -0700"
      },
      "message": "[PATCH] mempolicies: fix policy_zone check\n\nThere is a check in zonelist_policy that compares pieces of the bitmap\nobtained from a gfp mask via GFP_ZONETYPES with a zone number in function\nzonelist_policy().\n\nThe bitmap is an ORed mask of __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM.\nThe policy_zone is a zone number with the possible values of ZONE_DMA,\nZONE_DMA32, ZONE_HIGHMEM and ZONE_NORMAL. These are two different domains\nof values.\n\nFor some reason seemed to work before the zone reduction patchset (It\ndefinitely works on SGI boxes since we just have one zone and the check\ncannot fail).\n\nWith the zone reduction patchset this check definitely fails on systems\nwith two zones if the system actually has memory in both zones.\n\nThis is because ZONE_NORMAL is selected using no __GFP flag at\nall and thus gfp_zone(gfpmask) \u003d\u003d 0. ZONE_DMA is selected when __GFP_DMA\nis set. __GFP_DMA is 0x01.  So gfp_zone(gfpmask) \u003d\u003d 1.\n\npolicy_zone is set to ZONE_NORMAL (\u003d\u003d1) if ZONE_NORMAL and ZONE_DMA are\npopulated.\n\nFor ZONE_NORMAL gfp_zone(\u003cno _GFP_DMA\u003e) yields 0 which is \u003c\npolicy_zone(ZONE_NORMAL) and so policy is not applied to regular memory\nallocations!\n\nInstead gfp_zone(__GFP_DMA) \u003d\u003d 1 which results in policy being applied\nto DMA allocations!\n\nWhat we realy want in that place is to establish the highest allowable\nzone for a given gfp_mask. If the highest zone is higher or equal to the\npolicy_zone then memory policies need to be applied. We have such\na highest_zone() function in page_alloc.c.\n\nSo move the highest_zone() function from mm/page_alloc.c into\ninclude/linux/gfp.h.  On the way we simplify the function and use the new\nzone_type that was also introduced with the zone reduction patchset plus we\nalso specify the right type for the gfp flags parameter.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "3b98b087fc2daab67518d2baa8aef19a6ad82723",
      "tree": "a7defc8fa53b2023affc072cd20c4f4734e4395d",
      "parents": [
        "1678df37be8abbb381becdc40242ed915e775550"
      ],
      "author": {
        "name": "Nishanth Aravamudan",
        "email": "nacc@us.ibm.com",
        "time": "Thu Aug 31 21:27:53 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Sep 01 11:39:10 2006 -0700"
      },
      "message": "[PATCH] fix NUMA interleaving for huge pages\n\nSince vma-\u003evm_pgoff is in units of smallpages, VMAs for huge pages have the\nlower HPAGE_SHIFT - PAGE_SHIFT bits always cleared, which results in badd\noffsets to the interleave functions.  Take this difference from small pages\ninto account when calculating the offset.  This does add a 0-bit shift into\nthe small-page path (via alloc_page_vma()), but I think that is negligible.\n Also add a BUG_ON to prevent the offset from growing due to a negative\nright-shift, which probably shouldn\u0027t be allowed anyways.\n\nTested on an 8-memory node ppc64 NUMA box and got the interleaving I\nexpected.\n\nSigned-off-by: Nishanth Aravamudan \u003cnacc@us.ibm.com\u003e\nSigned-off-by: Adam Litke \u003cagl@us.ibm.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nAcked-by: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "ca889e6c45e0b112cb2ca9d35afc66297519b5d5",
      "tree": "0a5efdec2a61540204d34bcbf56dc691d8f9c391",
      "parents": [
        "bab1846a0582f627f5ec22aa2dc5f4f3e82e8176"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Fri Jun 30 01:55:44 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Jun 30 11:25:36 2006 -0700"
      },
      "message": "[PATCH] Use Zoned VM Counters for NUMA statistics\n\nThe numa statistics are really event counters.  But they are per node and\nso we have had special treatment for these counters through additional\nfields on the pcp structure.  We can now use the per zone nature of the\nzoned VM counters to realize these.\n\nThis will shrink the size of the pcp structure on NUMA systems.  We will\nhave some room to add additional per zone counters that will all still fit\nin the same cacheline.\n\n Bits\tPrior pcp size\t  \tSize after patch\tWe can add\n ------------------------------------------------------------------\n 64\t128 bytes (16 words)\t80 bytes (10 words)\t48\n 32\t 76 bytes (19 words)\t56 bytes (14 words)\t8 (64 byte cacheline)\n\t\t\t\t\t\t\t72 (128 byte)\n\nRemove the special statistics for numa and replace them with zoned vm\ncounters.  This has the side effect that global sums of these events now\nshow up in /proc/vmstat.\n\nAlso take the opportunity to move the zone_statistics() function from\npage_alloc.c into vmstat.c.\n\nDiscussions:\nV2 http://marc.theaimsgroup.com/?t\u003d115048227000002\u0026r\u003d1\u0026w\u003d2\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "99f895518368252ba862cc15ce4eb98ebbe1bec6",
      "tree": "a9dcc01963221d1fd6a7e357b95d361ebfe91c6d",
      "parents": [
        "8578cea7509cbdec25b31d08b48a92fcc3b1a9e3"
      ],
      "author": {
        "name": "Eric W. Biederman",
        "email": "ebiederm@xmission.com",
        "time": "Mon Jun 26 00:25:55 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Mon Jun 26 09:58:25 2006 -0700"
      },
      "message": "[PATCH] proc: don\u0027t lock task_structs indefinitely\n\nEvery inode in /proc holds a reference to a struct task_struct.  If a\ndirectory or file is opened and remains open after the the task exits this\npinning continues.  With 8K stacks on a 32bit machine the amount pinned per\nfile descriptor is about 10K.\n\nNormally I would figure a reasonable per user process limit is about 100\nprocesses.  With 80 processes, with a 1000 file descriptors each I can trigger\nthe 00M killer on a 32bit kernel, because I have pinned about 800MB of useless\ndata.\n\nThis patch replaces the struct task_struct pointer with a pointer to a struct\ntask_ref which has a struct task_struct pointer.  The so the pinning of dead\ntasks does not happen.\n\nThe code now has to contend with the fact that the task may now exit at any\ntime.  Which is a little but not muh more complicated.\n\nWith this change it takes about 1000 processes each opening up 1000 file\ndescriptors before I can trigger the OOM killer.  Much better.\n\n[mlp@google.com: task_mmu small fixes]\nSigned-off-by: Eric W. Biederman \u003cebiederm@xmission.com\u003e\nCc: Trond Myklebust \u003ctrond.myklebust@fys.uio.no\u003e\nCc: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Oleg Nesterov \u003coleg@tv-sign.ru\u003e\nCc: Albert Cahalan \u003cacahalan@gmail.com\u003e\nSigned-off-by: Prasanna Meda \u003cmlp@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "7b2259b3e53f128c10a9fded0965e69d4a949847",
      "tree": "c1827144c22dd49775190e05de791531e9fd21fd",
      "parents": [
        "68402ddc677005ed1b1359bbc1f279548cfc0928"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Sun Jun 25 05:46:48 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sun Jun 25 10:00:55 2006 -0700"
      },
      "message": "[PATCH] page migration: Support a vma migration function\n\nHooks for calling vma specific migration functions\n\nWith this patch a vma may define a vma-\u003evm_ops-\u003emigrate function.  That\nfunction may perform page migration on its own (some vmas may not contain page\nstructs and therefore cannot be handled by regular page migration.  Pages in a\nvma may require special preparatory treatment before migration is possible\netc) .  Only mmap_sem is held when the migration function is called.  The\nmigrate() function gets passed two sets of nodemasks describing the source and\nthe target of the migration.  The flags parameter either contains\n\nMPOL_MF_MOVE\twhich means that only pages used exclusively by\n\t\tthe specified mm should be moved\n\nor\n\nMPOL_MF_MOVE_ALL which means that pages shared with other processes\n\t\tshould also be moved.\n\nThe migration function returns 0 on success or an error condition.  An error\ncondition will prevent regular page migration from occurring.\n\nOn its own this patch cannot be included since there are no users for this\nfunctionality.  But it seems that the uncached allocator will need this\nfunctionality at some point.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "86c3a7645c05a7d06b72653aa4b2bea4e7229d1b",
      "tree": "ac68280e9c44dbc6ca4c88dc1e9c72a8f56be95e",
      "parents": [
        "35601547baf92d984b6e59cf3583649da04baea5"
      ],
      "author": {
        "name": "David Quigley",
        "email": "dpquigl@tycho.nsa.gov",
        "time": "Fri Jun 23 02:04:02 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Jun 23 07:42:54 2006 -0700"
      },
      "message": "[PATCH] SELinux: add security_task_movememory calls to mm code\n\nThis patch inserts security_task_movememory hook calls into memory management\ncode to enable security modules to mediate this operation between tasks.\n\nSince the last posting, the hook has been renamed following feedback from\nChristoph Lameter.\n\nSigned-off-by: David Quigley \u003cdpquigl@tycho.nsa.gov\u003e\nAcked-by:  Stephen Smalley \u003csds@tycho.nsa.gov\u003e\nSigned-off-by: James Morris \u003cjmorris@namei.org\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: Chris Wright \u003cchrisw@sous-sol.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "742755a1d8ce2b548428f7aacf1758b4bba50080",
      "tree": "53426657e14dc19a694d418274c9a6f4dcb8a997",
      "parents": [
        "95a402c3847cc16f4ba03013cd01404fa0f14c2e"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Fri Jun 23 02:03:55 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Jun 23 07:42:53 2006 -0700"
      },
      "message": "[PATCH] page migration: sys_move_pages(): support moving of individual pages\n\nmove_pages() is used to move individual pages of a process. The function can\nbe used to determine the location of pages and to move them onto the desired\nnode. move_pages() returns status information for each page.\n\nlong move_pages(pid, number_of_pages_to_move,\n\t\taddresses_of_pages[],\n\t\tnodes[] or NULL,\n\t\tstatus[],\n\t\tflags);\n\nThe addresses of pages is an array of void * pointing to the\npages to be moved.\n\nThe nodes array contains the node numbers that the pages should be moved\nto. If a NULL is passed instead of an array then no pages are moved but\nthe status array is updated. The status request may be used to determine\nthe page state before issuing another move_pages() to move pages.\n\nThe status array will contain the state of all individual page migration\nattempts when the function terminates. The status array is only valid if\nmove_pages() completed successfullly.\n\nPossible page states in status[]:\n\n0..MAX_NUMNODES\tThe page is now on the indicated node.\n\n-ENOENT\t\tPage is not present\n\n-EACCES\t\tPage is mapped by multiple processes and can only\n\t\tbe moved if MPOL_MF_MOVE_ALL is specified.\n\n-EPERM\t\tThe page has been mlocked by a process/driver and\n\t\tcannot be moved.\n\n-EBUSY\t\tPage is busy and cannot be moved. Try again later.\n\n-EFAULT\t\tInvalid address (no VMA or zero page).\n\n-ENOMEM\t\tUnable to allocate memory on target node.\n\n-EIO\t\tUnable to write back page. The page must be written\n\t\tback in order to move it since the page is dirty and the\n\t\tfilesystem does not provide a migration function that\n\t\twould allow the moving of dirty pages.\n\n-EINVAL\t\tA dirty page cannot be moved. The filesystem does not provide\n\t\ta migration function and has no ability to write back pages.\n\nThe flags parameter indicates what types of pages to move:\n\nMPOL_MF_MOVE\tMove pages that are only mapped by the process.\n\nMPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.\n\t\tRequires sufficient capabilities.\n\nPossible return codes from move_pages()\n\n-ENOENT\t\tNo pages found that would require moving. All pages\n\t\tare either already on the target node, not present, had an\n\t\tinvalid address or could not be moved because they were\n\t\tmapped by multiple processes.\n\n-EINVAL\t\tFlags other than MPOL_MF_MOVE(_ALL) specified or an attempt\n\t\tto migrate pages in a kernel thread.\n\n-EPERM\t\tMPOL_MF_MOVE_ALL specified without sufficient priviledges.\n\t\tor an attempt to move a process belonging to another user.\n\n-EACCES\t\tOne of the target nodes is not allowed by the current cpuset.\n\n-ENODEV\t\tOne of the target nodes is not online.\n\n-ESRCH\t\tProcess does not exist.\n\n-E2BIG\t\tToo many pages to move.\n\n-ENOMEM\t\tNot enough memory to allocate control array.\n\n-EFAULT\t\tParameters could not be accessed.\n\nA test program for move_pages() may be found with the patches\non ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3\n\nFrom: Christoph Lameter \u003cclameter@sgi.com\u003e\n\n  Detailed results for sys_move_pages()\n\n  Pass a pointer to an integer to get_new_page() that may be used to\n  indicate where the completion status of a migration operation should be\n  placed.  This allows sys_move_pags() to report back exactly what happened to\n  each page.\n\n  Wish there would be a better way to do this. Looks a bit hacky.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Jes Sorensen \u003cjes@trained-monkey.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nCc: Michael Kerrisk \u003cmtk-manpages@gmx.net\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "95a402c3847cc16f4ba03013cd01404fa0f14c2e",
      "tree": "0fd9b3379f70cc99b2325bccaa150089abf6c8b3",
      "parents": [
        "aaa994b300a172afafab47938804836b923e5ef7"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Fri Jun 23 02:03:53 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Jun 23 07:42:53 2006 -0700"
      },
      "message": "[PATCH] page migration: use allocator function for migrate_pages()\n\nInstead of passing a list of new pages, pass a function to allocate a new\npage.  This allows the correct placement of MPOL_INTERLEAVE pages during page\nmigration.  It also further simplifies the callers of migrate pages.\nmigrate_pages() becomes similar to migrate_pages_to() so drop\nmigrate_pages_to().  The batching of new page allocations becomes unnecessary.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Jes Sorensen \u003cjes@trained-monkey.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "aaa994b300a172afafab47938804836b923e5ef7",
      "tree": "ccc1acf72e9d1dfbd25fa5f8e067a195f93b0319",
      "parents": [
        "e24f0b8f76cc3dd96f36f5b6a9f020f6c3fce198"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Fri Jun 23 02:03:52 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Jun 23 07:42:53 2006 -0700"
      },
      "message": "[PATCH] page migration: handle freeing of pages in migrate_pages()\n\nDo not leave pages on the lists passed to migrate_pages().  Seems that we will\nnot need any postprocessing of pages.  This will simplify the handling of\npages by the callers of migrate_pages().\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Jes Sorensen \u003cjes@trained-monkey.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "6d472be37896b1c41b50f3da124f8b7718ba7797",
      "tree": "c01adec9db3c85b48e337189b4fdc1f7e6f23733",
      "parents": [
        "4409ebe9afabe7db77eaaae9eb3eb05b8315ce4a"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Thu Apr 20 02:43:12 2006 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Thu Apr 20 07:54:03 2006 -0700"
      },
      "message": "[PATCH] Remove cond_resched in gather_stats()\n\ngather_stats() is called with a spinlock held from check_pte_range.  We\ncannot reschedule with a lock held.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "7f927fcc2fd1575d01efb4b76665975007945690",
      "tree": "fbb84689600ea512d7b52f9fc46db2d7d8d7c1fd",
      "parents": [
        "ded23ac62776b4360d88e9b0330792d2c57fdfdf"
      ],
      "author": {
        "name": "Alexey Dobriyan",
        "email": "adobriyan@gmail.com",
        "time": "Tue Mar 28 01:56:53 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Mar 28 09:16:08 2006 -0800"
      },
      "message": "[PATCH] Typo fixes\n\nFix a lot of typos.  Eyeballed by jmc@ in OpenBSD.\n\nSigned-off-by: Alexey Dobriyan \u003cadobriyan@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "c61afb181c649754ea221f104e268cbacfc993e3",
      "tree": "870917b3f9175cf1663a2620d989856913cfb5f8",
      "parents": [
        "101a50019ae5e370d73984ee05d56dd3b08f330a"
      ],
      "author": {
        "name": "Paul Jackson",
        "email": "pj@sgi.com",
        "time": "Fri Mar 24 03:16:08 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Mar 24 07:33:23 2006 -0800"
      },
      "message": "[PATCH] cpuset memory spread slab cache optimizations\n\nThe hooks in the slab cache allocator code path for support of NUMA\nmempolicies and cpuset memory spreading are in an important code path.  Many\nsystems will use neither feature.\n\nThis patch optimizes those hooks down to a single check of some bits in the\ncurrent tasks task_struct flags.  For non NUMA systems, this hook and related\ncode is already ifdef\u0027d out.\n\nThe optimization is done by using another task flag, set if the task is using\na non-default NUMA mempolicy.  Taking this flag bit along with the\nPF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this \u0027cpuset\nmemory spreading\u0027 patch set, one can check for the combination of any of these\nspecial case memory placement mechanisms with a single test of the current\ntasks task_struct flags.\n\nThis patch also tightens up the code, to save a few bytes of kernel text\nspace, and moves some of it out of line.  Due to the nested inlines called\nfrom multiple places, we were ending up with three copies of this code, which\nonce we get off the main code path (for local node allocation) seems a bit\nwasteful of instruction memory.\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "b20a35035f983f4ac7e29c4a68f30e43510007e0",
      "tree": "fdf090ddddbcc275349f62f71adc98649e2c683b",
      "parents": [
        "442295c94bf650221af3ef20fc68fa3e93876818"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Wed Mar 22 00:09:12 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Mar 22 07:54:06 2006 -0800"
      },
      "message": "[PATCH] page migration reorg\n\nCentralize the page migration functions in anticipation of additional\ntinkering.  Creates a new file mm/migrate.c\n\n1. Extract buffer_migrate_page() from fs/buffer.c\n\n2. Extract central migration code from vmscan.c\n\n3. Extract some components from mempolicy.c\n\n4. Export pageout() and remove_from_swap() from vmscan.c\n\n5. Make it possible to configure NUMA systems without page migration\n   and non-NUMA systems with page migration.\n\nI had to so some #ifdeffing in mempolicy.c that may need a cleanup.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "fcc234f888ba2365c44ba0507eb8a18eebf1f594",
      "tree": "afc0d6f5a6191a94d8285f0b21ecec5a9b911df9",
      "parents": [
        "b5d8ca7c50826c0b456b4a646875dc573adfde2b"
      ],
      "author": {
        "name": "Pekka Enberg",
        "email": "penberg@cs.helsinki.fi",
        "time": "Wed Mar 22 00:08:13 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Mar 22 07:53:58 2006 -0800"
      },
      "message": "[PATCH] mm: kill kmem_cache_t usage\n\nWe have struct kmem_cache now so use it instead of the old typedef.\n\nSigned-off-by: Pekka Enberg \u003cpenberg@cs.helsinki.fi\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "90036ee5938d89638e80f4d0d0700d0f2dbd4a6a",
      "tree": "de0f9275dcf8a051baf70df8574ce34b8b1f158d",
      "parents": [
        "e0e8eb54d8ae0c4cfd1d297f6351b08a7f635c5f"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Thu Mar 16 23:03:59 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Mar 17 07:51:25 2006 -0800"
      },
      "message": "[PATCH] page migration: Fail with error if swap not setup\n\nCurrently the migration of anonymous pages will silently fail if no swap is\nsetup.  This patch makes page migration functions check for available swap\nand fail with -ENODEV if no swap space is available.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "74c002410548c7cb1744b45d17a5fa21da515b63",
      "tree": "d83774ce92907ecaf450998ddaf865836fde00ea",
      "parents": [
        "b4fb376628e63bfc8071fc915b921da3db4a3385"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Tue Mar 14 19:50:21 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Mar 14 21:43:02 2006 -0800"
      },
      "message": "[PATCH] Consistent capabilites associated with MPOL_MOVE_ALL\n\nIt seems that setting scheduling policy and priorities is also the kind of\nthing that might be performed in apps that also use the NUMA API, so it\nwould seem consistent to use CAP_SYS_NICE for NUMA also.\n\nSo use CAP_SYS_NICE for controlling migration permissions.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Michael Kerrisk \u003cmtk-manpages@gmx.net\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "7f709ed0e3ccd3e88e0632b69f00174e83f8d98b",
      "tree": "2fc797f06b4ac9177878468a9e59992723ecda5c",
      "parents": [
        "d5f735e52fb41e032b0db08aa20c02dbb9cd0db3"
      ],
      "author": {
        "name": "Andrew Morton",
        "email": "akpm@osdl.org",
        "time": "Tue Mar 07 21:55:22 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Mar 08 14:14:00 2006 -0800"
      },
      "message": "[PATCH] numa_maps-update fix\n\nFix the mm/mempolicy.c build for !CONFIG_HUGETLB_PAGE.\n\nCc: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nCc: Martin Bligh \u003cmbligh@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "397874dfe9862b494e1fdcd2baef4ac432d224c8",
      "tree": "a6eff78eb0f3ba641e3c57f24fb8071cb295212c",
      "parents": [
        "2fbf182ed00a71c35e53329c2010df2baf8a89c6"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Mon Mar 06 15:42:53 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Mon Mar 06 18:40:45 2006 -0800"
      },
      "message": "[PATCH] numa_maps update\n\nChange the format of numa_maps to be more compact and contain additional\ninformation that is useful for managing and troubleshooting memory on a\nNUMA system.  Numa_maps can now also support huge pages.\n\nFixes:\n\n1. More compact format. Only display fields if they contain additional\n\tinformation.\n\n2. Always display information for all vmas. The old numa_maps did not display\n\tvma with no mapped entries. This was a bit confusing because page\n\tmigration removes ptes for file backed vmas. After page migration\n\ta part of the vmas vanished.\n\n3. Rename maxref to maxmap. This is the maximum mapcount of all the pages\n\tin a vma and may be used as an indicator as to how many processes\n\tmay be using a certain vma.\n\n4. Include the ability to scan over huge page vmas.\n\nNew items shown:\n\ndirty\n\tNumber of pages in a vma that have either the dirty bit set in the\n\tpage_struct or in the pte.\n\nfile\u003d\u003cfilename\u003e\n\tThe file backing the pages if any\n\nstack\n\tStack area\n\nheap\n\tHeap area\n\nhuge\n\tHuge page area. The number of pages shows is the number of huge\n\tpages not the regular sized pages.\n\nswapcache\n\tNumber of pages with swap references. Must be \u003e0 in order to\n\tbe shown.\n\nactive\n\tNumber of active pages. Only displayed if different from the number\n\tof pages mapped.\n\nwriteback\n\tNumber of pages under writeback. Only displayed if \u003e0.\n\nSample ouput of a process using huge pages:\n\n00000000 default\n2000000000000000 default file\u003d/lib/ld-2.3.90.so mapped\u003d13 mapmax\u003d30 N0\u003d13\n2000000000044000 default file\u003d/lib/ld-2.3.90.so anon\u003d2 dirty\u003d2 swapcache\u003d2 N2\u003d2\n2000000000064000 default file\u003d/lib/librt-2.3.90.so mapped\u003d2 active\u003d1 N1\u003d1 N3\u003d1\n2000000000074000 default file\u003d/lib/librt-2.3.90.so\n2000000000080000 default file\u003d/lib/librt-2.3.90.so anon\u003d1 swapcache\u003d1 N2\u003d1\n2000000000084000 default\n2000000000088000 default file\u003d/lib/libc-2.3.90.so mapped\u003d52 mapmax\u003d32 active\u003d48 N0\u003d52\n20000000002bc000 default file\u003d/lib/libc-2.3.90.so\n20000000002c8000 default file\u003d/lib/libc-2.3.90.so anon\u003d3 dirty\u003d2 swapcache\u003d3 active\u003d2 N1\u003d1 N2\u003d2\n20000000002d4000 default anon\u003d1 swapcache\u003d1 N1\u003d1\n20000000002d8000 default file\u003d/lib/libpthread-2.3.90.so mapped\u003d8 mapmax\u003d3 active\u003d7 N2\u003d2 N3\u003d6\n20000000002fc000 default file\u003d/lib/libpthread-2.3.90.so\n2000000000308000 default file\u003d/lib/libpthread-2.3.90.so anon\u003d1 dirty\u003d1 swapcache\u003d1 N1\u003d1\n200000000030c000 default anon\u003d1 dirty\u003d1 swapcache\u003d1 N1\u003d1\n2000000000320000 default anon\u003d1 dirty\u003d1 N1\u003d1\n200000000071c000 default\n2000000000720000 default anon\u003d2 dirty\u003d2 swapcache\u003d1 N1\u003d1 N2\u003d1\n2000000000f1c000 default\n2000000000f20000 default anon\u003d2 dirty\u003d2 swapcache\u003d1 active\u003d1 N2\u003d1 N3\u003d1\n200000000171c000 default\n2000000001720000 default anon\u003d1 dirty\u003d1 swapcache\u003d1 N1\u003d1\n2000000001b20000 default\n2000000001b38000 default file\u003d/lib/libgcc_s.so.1 mapped\u003d2 N1\u003d2\n2000000001b48000 default file\u003d/lib/libgcc_s.so.1\n2000000001b54000 default file\u003d/lib/libgcc_s.so.1 anon\u003d1 dirty\u003d1 active\u003d0 N1\u003d1\n2000000001b58000 default file\u003d/lib/libunwind.so.7.0.0 mapped\u003d2 active\u003d1 N1\u003d2\n2000000001b74000 default file\u003d/lib/libunwind.so.7.0.0\n2000000001b80000 default file\u003d/lib/libunwind.so.7.0.0\n2000000001b84000 default\n4000000000000000 default file\u003d/media/huge/test9 mapped\u003d1 N1\u003d1\n6000000000000000 default file\u003d/media/huge/test9 anon\u003d1 dirty\u003d1 active\u003d0 N1\u003d1\n6000000000004000 default heap\n607fffff7fffc000 default anon\u003d1 dirty\u003d1 swapcache\u003d1 N2\u003d1\n607fffffff06c000 default stack anon\u003d1 dirty\u003d1 active\u003d0 N1\u003d1\n8000000060000000 default file\u003d/mnt/huge/test0 huge dirty\u003d3 N1\u003d3\n8000000090000000 default file\u003d/mnt/huge/test1 huge dirty\u003d3 N0\u003d1 N2\u003d2\n80000000c0000000 default file\u003d/mnt/huge/test2 huge dirty\u003d3 N1\u003d1 N3\u003d2\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "a57ebfdb2cf9fa60dfa2f403f70ef6c432ca2a62",
      "tree": "1597343f3e9c076fa37ceb2c6442cc9e3ca44656",
      "parents": [
        "685db65e422bfa523b8a9dacb5a658b42b254f05"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Thu Mar 02 02:54:37 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Thu Mar 02 08:33:07 2006 -0800"
      },
      "message": "[PATCH] numa_maps: Fix potential crash on non IA64 platforms\n\nnuma_maps should not scan over huge vmas in order not to cause problems for\nnon IA64 platforms that may have pte entries pointing to huge pages in a\nvariety of ways in their page tables.  Add a simple check to ignore vmas\ncontaining huge pages.\n\nSigned-off-by: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "511030bcd24119fa3759ef3f914d354e107ef839",
      "tree": "707edb2c804ad6c42ffbd0ef6685b49e076f0dcd",
      "parents": [
        "5cf6c541f5b3902bdcc2d311d70f8e730aaff1be"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Tue Feb 28 16:58:57 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Tue Feb 28 20:53:43 2006 -0800"
      },
      "message": "[PATCH] Fix sys_migrate_pages: Move all pages when invoked from root\n\nCurrently sys_migrate_pages only moves pages belonging to a process.  This\nis okay when invoked from a regular user.  But if invoked from root it\nshould move all pages as documented in the migrate_pages manpage.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@muc.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "1e275d406bf6b88e4de6925cf594b64bb2ec49bc",
      "tree": "6fe143317fbc442407244a3c55ecf475072a28f3",
      "parents": [
        "f68a106f224c21148c5264a429fac149dc7ad0ac"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Fri Feb 24 13:04:12 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Feb 24 14:31:38 2006 -0800"
      },
      "message": "[PATCH] page migration: Fix MPOL_INTERLEAVE behavior for migration via mbind()\n\nmigrate_pages_to() allocates a list of new pages on the intended target\nnode or with the intended policy and then uses the list of new pages as\ntargets for the migration of a list of pages out of place.\n\nWhen the pages are allocated it is not clear which of the out of place\npages will be moved to the new pages.  So we cannot specify an address as\nneeded by alloc_page_vma().  This causes problem for MPOL_INTERLEAVE which\nwill currently allocate the pages on the first node of the set.  If mbind\nis used with vma that has the policy of MPOL_INTERLEAVE then the\ninterleaving of pages may be destroyed.\n\nThis patch fixes that by generating a fake address for each alloc_page_vma\nwhich will result is a distribution of pages as prescribed by\nMPOL_INTERLEAVE.\n\nLee also noted that the sequence of nodes for the new pages seems to be\ninverted.  So we also invert the way the lists of pages for migration are\nbuild.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nLooks-ok-to: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "fcab6f351305029fc5e3c632209d45cae57e4835",
      "tree": "21a84b35d172b63a47421190af83eb12c006a8e2",
      "parents": [
        "1dd31b6c89611ee91c0ff309c8733c0af61579e8"
      ],
      "author": {
        "name": "Alexey Dobriyan",
        "email": "adobriyan@gmail.com",
        "time": "Mon Feb 20 18:28:10 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Mon Feb 20 20:00:11 2006 -0800"
      },
      "message": "[PATCH] mm/mempolicy.c: fix \u0027if ();\u0027 typo\n\n[akpm; it happens that the code was still correct, only inefficient ]\n\nSigned-off-by: Alexey Dobriyan \u003cadobriyan@gmail.com\u003e\nCc: Christoph Lameter \u003cchristoph@lameter.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "a9c930bac163c5e616ca0ba9378e7dc746c93227",
      "tree": "58ff339858cee3a87893c094561eb72381044a08",
      "parents": [
        "c255d844dd73616f23e4b4733edcc2e5fa4042b2"
      ],
      "author": {
        "name": "Andi Kleen",
        "email": "ak@suse.de",
        "time": "Mon Feb 20 18:27:59 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Mon Feb 20 20:00:10 2006 -0800"
      },
      "message": "[PATCH] Fix units in mbind check\n\nmaxnode is a bit index and can\u0027t be directly compared against a byte length\nlike PAGE_SIZE\n\nSigned-off-by: Andi Kleen \u003cak@suse.de\u003e\nCc: Chris Wright \u003cchrisw@sous-sol.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "636f13c174dd7c84a437d3c3e8fa66f03f7fda63",
      "tree": "2dd28d7729a52655eb4e155af620c5afdda85a8a",
      "parents": [
        "74910e6c7dc7471b286a883c1a7af70483ffd2ba"
      ],
      "author": {
        "name": "Chris Wright",
        "email": "chrisw@sous-sol.org",
        "time": "Fri Feb 17 13:59:36 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Feb 17 14:09:22 2006 -0800"
      },
      "message": "[PATCH] sys_mbind sanity checking\n\nMake sure maxnodes is safe size before calculating nlongs in\nget_nodes().\n\nSigned-off-by: Chris Wright \u003cchrisw@sous-sol.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "dd942ae331425812930cd01766178b7e28e65f2d",
      "tree": "b513bcfa00c1fc0f78e06b7f4c8d999275b64dfb",
      "parents": [
        "759b650f54ed13e9b3d6c064c763a72ee09c74dd"
      ],
      "author": {
        "name": "Andi Kleen",
        "email": "ak@suse.de",
        "time": "Fri Feb 17 01:39:16 2006 +0100"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Fri Feb 17 08:18:14 2006 -0800"
      },
      "message": "[PATCH] Handle all and empty zones when setting up custom zonelists for mbind\n\nThe memory allocator doesn\u0027t like empty zones (which have an\nuninitialized freelist), so a x86-64 system with a node fully\nin GFP_DMA32 only would crash on mbind.\n\nFix that up by putting all possible zones as fallback into the zonelist\nand skipping the empty ones.\n\nIn fact the code always enough allocated space for all zones,\nbut only used it for the highest. This change just uses all the\nmemory that was allocated before.\n\nThis should work fine for now, but whoever implements node hot removal\nneeds to fix this somewhere else too (or make sure zone datastructures\nby itself never go away, only their memory)\n\nSigned-off-by: Andi Kleen \u003cak@suse.de\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "00ac59adfca8f2f339beb0b67054e786c275553e",
      "tree": "96489ebcdebec957f94de44940e10b3935bc3c18",
      "parents": [
        "9e8c34edfd7ae97d0e3391f34d9d26a0167912bf"
      ],
      "author": {
        "name": "Chen, Kenneth W",
        "email": "kenneth.w.chen@intel.com",
        "time": "Fri Feb 03 21:51:14 2006 +0100"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sat Feb 04 16:43:14 2006 -0800"
      },
      "message": "[PATCH] x86_64: Fix memory policy build without CONFIG_HUGETLBFS\n\n\u003e mm/mempolicy.c: In function `huge_zonelist\u0027:\n\u003e mm/mempolicy.c:1045: error: `HPAGE_SHIFT\u0027 undeclared (first use in this function)\n\u003e mm/mempolicy.c:1045: error: (Each undeclared identifier is reported only once\n\u003e mm/mempolicy.c:1045: error: for each function it appears in.)\n\u003e make[1]: *** [mm/mempolicy.o] Error 1\n\nNeed to wrap huge_zonelist function with CONFIG_HUGETLBFS.\n\nSigned-off-by: Ken Chen \u003ckenneth.w.chen@intel.com\u003e\nSigned-off-by: Andi Kleen \u003cak@suse.de\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "7e2ab150d1b3b286a4c864c60a549b2601777b63",
      "tree": "9d8f4f3af382a043ada81f75c324e76dff9f0043",
      "parents": [
        "a3351e525e4768c29aa5d22ef59b5b38e0361e53"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Wed Feb 01 03:05:40 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Feb 01 08:53:16 2006 -0800"
      },
      "message": "[PATCH] Direct Migration V9: upgrade MPOL_MF_MOVE and sys_migrate_pages()\n\nModify policy layer to support direct page migration\n\n- Add migrate_pages_to() allowing the migration of a list of pages to a a\n  specified node or to vma with a specific allocation policy in sets of\n  MIGRATE_CHUNK_SIZE pages\n\n- Modify do_migrate_pages() to do a staged move of pages from the source\n  nodes to the target nodes.\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "86c562a9d6683063e071692fe14e0a18e64ee1be",
      "tree": "dd64f8bff4624f17f2245aeadf962e0d6d5974a0",
      "parents": [
        "dc85da15d42b0efc792b0f5eab774dc5dbc1ceec"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Wed Jan 18 17:42:37 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Jan 18 19:20:18 2006 -0800"
      },
      "message": "[PATCH] mm: optimize numa policy handling in slab allocator\n\nMove the interrupt check from slab_node into ___cache_alloc and adds an\n\"unlikely()\" to avoid pipeline stalls on some architectures.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "dc85da15d42b0efc792b0f5eab774dc5dbc1ceec",
      "tree": "4b347b10dadf3cc7bdbff36709e8cee2bc673996",
      "parents": [
        "fc0abb1451c64c79ac80665d5ba74450ce274e4d"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Wed Jan 18 17:42:36 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Jan 18 19:20:18 2006 -0800"
      },
      "message": "[PATCH] NUMA policies in the slab allocator V2\n\nThis patch fixes a regression in 2.6.14 against 2.6.13 that causes an\nimbalance in memory allocation during bootup.\n\nThe slab allocator in 2.6.13 is not numa aware and simply calls\nalloc_pages().  This means that memory policies may control the behavior of\nalloc_pages().  During bootup the memory policy is set to MPOL_INTERLEAVE\nresulting in the spreading out of allocations during bootup over all\navailable nodes.  The slab allocator in 2.6.13 has only a single list of\nslab pages.  As a result the per cpu slab cache and the spinlock controlled\npage lists may contain slab entries from off node memory.  The slab\nallocator in 2.6.13 makes no effort to discern the locality of an entry on\nits lists.\n\nThe NUMA aware slab allocator in 2.6.14 controls locality of the slab pages\nexplicitly by calling alloc_pages_node().  The NUMA slab allocator manages\nslab entries by having lists of available slab pages for each node.  The\nper cpu slab cache can only contain slab entries associated with the node\nlocal to the processor.  This guarantees that the default allocation mode\nof the slab allocator always assigns local memory if available.\n\nSetting MPOL_INTERLEAVE as a default policy during bootup has no effect\nanymore.  In 2.6.14 all node unspecific slab allocations are performed on\nthe boot processor.  This means that most of key data structures are\nallocated on one node.  Most processors will have to refer to these\nstructures making the boot node a potential bottleneck.  This may reduce\nperformance and cause unnecessary memory pressure on the boot node.\n\nThis patch implements NUMA policies in the slab layer.  There is the need\nof explicit application of NUMA memory policies by the slab allcator itself\nsince the NUMA slab allocator does no longer let the page_allocator control\nlocality.\n\nThe check for policies is made directly at the beginning of __cache_alloc\nusing current-\u003emempolicy.  The memory policy is already frequently checked\nby the page allocator (alloc_page_vma() and alloc_page_current()).  So it\nis highly likely that the cacheline is present.  For MPOL_INTERLEAVE\nkmalloc() will spread out each request to one node after another so that an\nequal distribution of allocations can be obtained during bootup.\n\nIt is not possible to push the policy check to lower layers of the NUMA\nslab allocator since the per cpu caches are now only containing slab\nentries from the current node.  If the policy says that the local node is\nnot to be preferred or forbidden then there is no point in checking the\nslab cache or local list of slab pages.  The allocation better be directed\nimmediately to the lists containing slab entries for the allowed set of\nnodes.\n\nThis way of applying policy also fixes another strange behavior in 2.6.13.\nalloc_pages() is controlled by the memory allocation policy of the current\nprocess.  It could therefore be that one process is running with\nMPOL_INTERLEAVE and would f.e.  obtain a new page following that policy\nsince no slab entries are in the lists anymore.  A page can typically be\nused for multiple slab entries but lets say that the current process is\nonly using one.  The other entries are then added to the slab lists.  These\nare now non local entries in the slab lists despite of the possible\navailability of local pages that would provide faster access and increase\nthe performance of the application.\n\nAnother process without MPOL_INTERLEAVE may now run and expect a local slab\nentry from kmalloc().  However, there are still these free slab entries\nfrom the off node page obtained from the other process via MPOL_INTERLEAVE\nin the cache.  The process will then get an off node slab entry although\nother slab entries may be available that are local to that process.  This\nmeans that the policy if one process may contaminate the locality of the\nslab caches for other processes.\n\nThis patch in effect insures that a per process policy is followed for the\nallocation of slab entries and that there cannot be a memory policy\ninfluence from one process to another.  A process with default policy will\nalways get a local slab entry if one is available.  And the process using\nmemory policies will get its memory arranged as requested.  Off-node slab\nallocation will require the use of spinlocks and will make the use of per\ncpu caches not possible.  A process using memory policies to redirect\nallocations offnode will have to cope with additional lock overhead in\naddition to the latency added by the need to access a remote slab entry.\n\nChanges V1-\u003eV2\n- Remove #ifdef CONFIG_NUMA by moving forward declaration into\n  prior #ifdef CONFIG_NUMA section.\n\n- Give the function determining the node number to use a saner\n  name.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "fc3012896337c83a056c496d7cfb0072e1591181",
      "tree": "5b774e59ba982fd4330eb96abace9cda9d744b0f",
      "parents": [
        "053837fce7aa79025ed57656855df09f80175527"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Wed Jan 18 17:42:29 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Jan 18 19:20:17 2006 -0800"
      },
      "message": "[PATCH] Simplify migrate_page_add\n\nSimplify migrate_page_add after feedback from Hugh.  This also allows us to\ndrop one parameter from migrate_page_add.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "053837fce7aa79025ed57656855df09f80175527",
      "tree": "05d7615894131a368fc4943f641b11acdd2ae694",
      "parents": [
        "e236a166b2bc437769a9b8b5d19186a3761bde48"
      ],
      "author": {
        "name": "Nick Piggin",
        "email": "npiggin@suse.de",
        "time": "Wed Jan 18 17:42:27 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Wed Jan 18 19:20:17 2006 -0800"
      },
      "message": "[PATCH] mm: migration page refcounting fix\n\nMigration code currently does not take a reference to target page\nproperly, so between unlocking the pte and trying to take a new\nreference to the page with isolate_lru_page, anything could happen to\nit.\n\nFix this by holding the pte lock until we get a chance to elevate the\nrefcount.\n\nOther small cleanups while we\u0027re here.\n\nSigned-off-by: Nick Piggin \u003cnpiggin@suse.de\u003e\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "7339ff8302fd70aabf5f1ae26e0c4905fa74a495",
      "tree": "38ee561d51b7e4db7c0d6dd9ebd9fc22c2b6ab88",
      "parents": [
        "852cf918dcf2ae46468b425e679fbcbf0ea8fdbb"
      ],
      "author": {
        "name": "Robin Holt",
        "email": "holt@sgi.com",
        "time": "Sat Jan 14 13:20:48 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sat Jan 14 18:27:07 2006 -0800"
      },
      "message": "[PATCH] Add tmpfs options for memory placement policies\n\nAnything that writes into a tmpfs filesystem is liable to disproportionately\ndecrease the available memory on a particular node.  Since there\u0027s no telling\nwhat sort of application (e.g.  dd/cp/cat) might be dropping large files\nthere, this lets the admin choose the appropriate default behavior for their\nsite\u0027s situation.\n\nIntroduce a tmpfs mount option which allows specifying a memory policy and\na second option to specify the nodelist for that policy.  With the default\npolicy, tmpfs will behave as it does today.  This patch adds support for\npreferred, bind, and interleave policies.\n\nThe default policy will cause pages to be added to tmpfs files on the node\nwhich is doing the writing.  Some jobs expect a single process to create\nand manage the tmpfs files.  This results in a node which has a\nsignificantly reduced number of free pages.\n\nWith this patch, the administrator can specify the policy and nodes for\nthat policy where they would prefer allocations.\n\nThis patch was originally written by Brent Casavant and Hugh Dickins.  I\nadded support for the bind and preferred policies and the mpol_nodelist\nmount option.\n\nSigned-off-by: Brent Casavant \u003cbcasavan@sgi.com\u003e\nSigned-off-by: Hugh Dickins \u003chugh@veritas.com\u003e\nSigned-off-by: Robin Holt \u003cholt@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "f4598c8b3678abd65be3be00ed3d046375a4777e",
      "tree": "497aba8cdeb00b1fe8d227a9b839c1ce8980f3a2",
      "parents": [
        "1bc691d357c646700b9523d2aeca02847d3fb3f4"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@engr.sgi.com",
        "time": "Thu Jan 12 01:05:20 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Thu Jan 12 09:08:48 2006 -0800"
      },
      "message": "[PATCH] migration: make sure there is no attempt to migrate reserved pages.\n\nThis ensures that reserved pages are not migrated.  Reserved pages\ncurrently cause the WARN_ON to trigger in migrate_page_add()\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "4225399a66b315d4d1fb1cb61b75dda201c832e3",
      "tree": "c8bd976bc6590c5fe859c6129abb93072d99cfa8",
      "parents": [
        "202f72d5d1b5c2c084f63ef996c736d208b447b5"
      ],
      "author": {
        "name": "Paul Jackson",
        "email": "pj@sgi.com",
        "time": "Sun Jan 08 01:01:59 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sun Jan 08 20:13:44 2006 -0800"
      },
      "message": "[PATCH] cpuset: rebind vma mempolicies fix\n\nFix more of longstanding bug in cpuset/mempolicy interaction.\n\nNUMA mempolicies (mm/mempolicy.c) are constrained by the current tasks cpuset\nto just the Memory Nodes allowed by that cpuset.  The kernel maintains\ninternal state for each mempolicy, tracking what nodes are used for the\nMPOL_INTERLEAVE, MPOL_BIND or MPOL_PREFERRED policies.\n\nWhen a tasks cpuset memory placement changes, whether because the cpuset\nchanged, or because the task was attached to a different cpuset, then the\ntasks mempolicies have to be rebound to the new cpuset placement, so as to\npreserve the cpuset-relative numbering of the nodes in that policy.\n\nAn earlier fix handled such mempolicy rebinding for mempolicies attached to a\ntask.\n\nThis fix rebinds mempolicies attached to vma\u0027s (address ranges in a tasks\naddress space.) Due to the need to hold the task-\u003emm-\u003emmap_sem semaphore while\nupdating vma\u0027s, the rebinding of vma mempolicies has to be done when the\ncpuset memory placement is changed, at which time mmap_sem can be safely\nacquired.  The tasks mempolicy is rebound later, when the task next attempts\nto allocate memory and notices that its task-\u003ecpuset_mems_generation is\nout-of-date with its cpusets mems_generation.\n\nBecause walking the tasklist to find all tasks attached to a changing cpuset\nrequires holding tasklist_lock, a spinlock, one cannot update the vma\u0027s of the\naffected tasks while doing the tasklist scan.  In general, one cannot acquire\na semaphore (which can sleep) while already holding a spinlock (such as\ntasklist_lock).  So a list of mm references has to be built up during the\ntasklist scan, then the tasklist lock dropped, then for each mm, its mmap_sem\nacquired, and the vma\u0027s in that mm rebound.\n\nOnce the tasklist lock is dropped, affected tasks may fork new tasks, before\ntheir mm\u0027s are rebound.  A kernel global \u0027cpuset_being_rebound\u0027 is set to\npoint to the cpuset being rebound (there can only be one; cpuset modifications\nare done under a global \u0027manage_sem\u0027 semaphore), and the mpol_copy code that\nis used to copy a tasks mempolicies during fork catches such forking tasks,\nand ensures their children are also rebound.\n\nWhen a task is moved to a different cpuset, it is easier, as there is only one\ntask involved.  It\u0027s mm-\u003evma\u0027s are scanned, using the same\nmpol_rebind_policy() as used above.\n\nIt may happen that both the mpol_copy hook and the update done via the\ntasklist scan update the same mm twice.  This is ok, as the mempolicies of\neach vma in an mm keep track of what mems_allowed they are relative to, and\nsafely no-op a second request to rebind to the same nodes.\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "74cb21553f4bf244185b9bec4c26e4e3169ad55e",
      "tree": "3f8f13e8dacc8f0876b01f62765a123ce1722b17",
      "parents": [
        "909d75a3b77bdd8baa9429bad3b69a654d2954ce"
      ],
      "author": {
        "name": "Paul Jackson",
        "email": "pj@sgi.com",
        "time": "Sun Jan 08 01:01:56 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sun Jan 08 20:13:44 2006 -0800"
      },
      "message": "[PATCH] cpuset: numa_policy_rebind cleanup\n\nCleanup, reorganize and make more robust the mempolicy.c code to rebind\nmempolicies relative to the containing cpuset after a tasks memory placement\nchanges.\n\nThe real motivator for this cleanup patch is to lay more groundwork for the\nupcoming patch to correctly rebind NUMA mempolicies that are attached to vma\u0027s\nafter the containing cpuset memory placement changes.\n\nNUMA mempolicies are constrained by the cpuset their task is a member of.\nWhen either (1) a task is moved to a different cpuset, or (2) the \u0027mems\u0027\nmems_allowed of a cpuset is changed, then the NUMA mempolicies have embedded\nnode numbers (for MPOL_BIND, MPOL_INTERLEAVE and MPOL_PREFERRED) that need to\nbe recalculated, relative to their new cpuset placement.\n\nThe old code used an unreliable method of determining what was the old\nmems_allowed constraining the mempolicy.  It just looked at the tasks\nmems_allowed value.  This sort of worked with the present code, that just\nrebinds the -task- mempolicy, and leaves any -vma- mempolicies broken,\nreferring to the old nodes.  But in an upcoming patch, the vma mempolicies\nwill be rebound as well.  Then the order in which the various task and vma\nmempolicies are updated will no longer be deterministic, and one can no longer\ncount on the task-\u003emems_allowed holding the old value for as long as needed.\nIt\u0027s not even clear if the current code was guaranteed to work reliably for\ntask mempolicies.\n\nSo I added a mems_allowed field to each mempolicy, stating exactly what\nmems_allowed the policy is relative to, and updated synchronously and reliably\nanytime that the mempolicy is rebound.\n\nAlso removed a useless wrapper routine, numa_policy_rebind(), and had its\ncaller, cpuset_update_task_memory_state(), call directly to the rewritten\npolicy_rebind() routine, and made that rebind routine extern instead of\nstatic, and added a \"mpol_\" prefix to its name, making it\nmpol_rebind_policy().\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "909d75a3b77bdd8baa9429bad3b69a654d2954ce",
      "tree": "f9955ff697b7569fc75e5b8683d886315f34ac49",
      "parents": [
        "cf2a473c4089aa41c26f653200673f5a4cc25047"
      ],
      "author": {
        "name": "Paul Jackson",
        "email": "pj@sgi.com",
        "time": "Sun Jan 08 01:01:55 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sun Jan 08 20:13:44 2006 -0800"
      },
      "message": "[PATCH] cpuset: implement cpuset_mems_allowed\n\nProvide a cpuset_mems_allowed() method, which the sys_migrate_pages() code\nneeded, to obtain the mems_allowed vector of a cpuset, and replaced the\nworkaround in sys_migrate_pages() to call this new method.\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    }
  ],
  "next": "cf2a473c4089aa41c26f653200673f5a4cc25047"
}