)]}'
{
  "log": [
    {
      "commit": "60063497a95e716c9a689af3be2687d261f115b4",
      "tree": "6ce0d68db76982c53df46aee5f29f944ebf2c320",
      "parents": [
        "148817ba092f9f6edd35bad3c6c6b8e8f90fe2ed"
      ],
      "author": {
        "name": "Arun Sharma",
        "email": "asharma@fb.com",
        "time": "Tue Jul 26 16:09:06 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jul 26 16:49:47 2011 -0700"
      },
      "message": "atomic: use \u003clinux/atomic.h\u003e\n\nThis allows us to move duplicated code in \u003casm/atomic.h\u003e\n(atomic_inc_not_zero() for now) to \u003clinux/atomic.h\u003e\n\nSigned-off-by: Arun Sharma \u003casharma@fb.com\u003e\nReviewed-by: Eric Dumazet \u003ceric.dumazet@gmail.com\u003e\nCc: Ingo Molnar \u003cmingo@elte.hu\u003e\nCc: David Miller \u003cdavem@davemloft.net\u003e\nCc: Eric Dumazet \u003ceric.dumazet@gmail.com\u003e\nAcked-by: Mike Frysinger \u003cvapier@gentoo.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "bb2a0de92c891b8feeedc0178acb3ae009d899a8",
      "tree": "c2c0b3ad66c8da0e48c021927b2d747fb08b7ef3",
      "parents": [
        "1f4c025b5a5520fd2571244196b1b01ad96d18f6"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Jul 26 16:08:22 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jul 26 16:49:42 2011 -0700"
      },
      "message": "memcg: consolidate memory cgroup lru stat functions\n\nIn mm/memcontrol.c, there are many lru stat functions as..\n\n  mem_cgroup_zone_nr_lru_pages\n  mem_cgroup_node_nr_file_lru_pages\n  mem_cgroup_nr_file_lru_pages\n  mem_cgroup_node_nr_anon_lru_pages\n  mem_cgroup_nr_anon_lru_pages\n  mem_cgroup_node_nr_unevictable_lru_pages\n  mem_cgroup_nr_unevictable_lru_pages\n  mem_cgroup_node_nr_lru_pages\n  mem_cgroup_nr_lru_pages\n  mem_cgroup_get_local_zonestat\n\nSome of them are under #ifdef MAX_NUMNODES \u003e1 and others are not.\nThis seems bad. This patch consolidates all functions into\n\n  mem_cgroup_zone_nr_lru_pages()\n  mem_cgroup_node_nr_lru_pages()\n  mem_cgroup_nr_lru_pages()\n\nFor these functions, \"which LRU?\" information is passed by a mask.\n\nexample:\n  mem_cgroup_nr_lru_pages(mem, BIT(LRU_ACTIVE_ANON))\n\nAnd I added some macro as ALL_LRU, ALL_LRU_FILE, ALL_LRU_ANON.\n\nexample:\n  mem_cgroup_nr_lru_pages(mem, ALL_LRU)\n\nBTW, considering layout of NUMA memory placement of counters, this patch seems\nto be better.\n\nNow, when we gather all LRU information, we scan in following orer\n    for_each_lru -\u003e for_each_node -\u003e for_each_zone.\n\nThis means we\u0027ll touch cache lines in different node in turn.\n\nAfter patch, we\u0027ll scan\n    for_each_node -\u003e for_each_zone -\u003e for_each_lru(mask)\n\nThen, we\u0027ll gather information in the same cacheline at once.\n\n[akpm@linux-foundation.org: fix warnigns, build error]\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Balbir Singh \u003cbsingharora@gmail.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "c6830c22603aaecf65405af23f6da2d55892f9cb",
      "tree": "19458ebc7c32bef8a4ed59630cabb5785b1bdc11",
      "parents": [
        "af4087e0e682df12bdffec5cfafc2fec9208716e"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Thu Jun 16 17:28:07 2011 +0900"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jun 27 14:13:09 2011 -0700"
      },
      "message": "Fix node_start/end_pfn() definition for mm/page_cgroup.c\n\ncommit 21a3c96 uses node_start/end_pfn(nid) for detection start/end\nof nodes. But, it\u0027s not defined in linux/mmzone.h but defined in\n/arch/???/include/mmzone.h which is included only under\nCONFIG_NEED_MULTIPLE_NODES\u003dy.\n\nThen, we see\n  mm/page_cgroup.c: In function \u0027page_cgroup_init\u0027:\n  mm/page_cgroup.c:308: error: implicit declaration of function \u0027node_start_pfn\u0027\n  mm/page_cgroup.c:309: error: implicit declaration of function \u0027node_end_pfn\u0027\n\nSo, fixiing page_cgroup.c is an idea...\n\nBut node_start_pfn()/node_end_pfn() is a very generic macro and\nshould be implemented in the same manner for all archs.\n(m32r has different implementation...)\n\nThis patch removes definitions of node_start/end_pfn() in each archs\nand defines a unified one in linux/mmzone.h. It\u0027s not under\nCONFIG_NEED_MULTIPLE_NODES, now.\n\nA result of macro expansion is here (mm/page_cgroup.c)\n\nfor !NUMA\n start_pfn \u003d ((\u0026contig_page_data)-\u003enode_start_pfn);\n  end_pfn \u003d ({ pg_data_t *__pgdat \u003d (\u0026contig_page_data); __pgdat-\u003enode_start_pfn + __pgdat-\u003enode_spanned_pages;});\n\nfor NUMA (x86-64)\n  start_pfn \u003d ((node_data[nid])-\u003enode_start_pfn);\n  end_pfn \u003d ({ pg_data_t *__pgdat \u003d (node_data[nid]); __pgdat-\u003enode_start_pfn + __pgdat-\u003enode_spanned_pages;});\n\nChangelog:\n - fixed to avoid using \"nid\" twice in node_end_pfn() macro.\n\nReported-and-acked-by: Randy Dunlap \u003crandy.dunlap@oracle.com\u003e\nReported-and-tested-by: Ingo Molnar \u003cmingo@elte.hu\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "2a56d2220284b0e4dd8569fa475d7053f1c40a63",
      "tree": "96f959486a2f31db599e5f97167074bd1ecb3dc6",
      "parents": [
        "46f2cc80514e389bacfb642a32a4181fa1f1d20b",
        "239df0fd5ee25588f8a5ba7f7ee646940cc403f4"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri May 27 19:51:32 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri May 27 19:51:32 2011 -0700"
      },
      "message": "Merge branch \u0027for-linus\u0027 of master.kernel.org:/home/rmk/linux-2.6-arm\n\n* \u0027for-linus\u0027 of master.kernel.org:/home/rmk/linux-2.6-arm: (45 commits)\n  ARM: 6945/1: Add unwinding support for division functions\n  ARM: kill pmd_off()\n  ARM: 6944/1: mm: allow ASID 0 to be allocated to tasks\n  ARM: 6943/1: mm: use TTBR1 instead of reserved context ID\n  ARM: 6942/1: mm: make TTBR1 always point to swapper_pg_dir on ARMv6/7\n  ARM: 6941/1: cache: ensure MVA is cacheline aligned in flush_kern_dcache_area\n  ARM: add sendmmsg syscall\n  ARM: 6863/1: allow hotplug on msm\n  ARM: 6832/1: mmci: support for ST-Ericsson db8500v2\n  ARM: 6830/1: mach-ux500: force PrimeCell revisions\n  ARM: 6829/1: amba: make hardcoded periphid override hardware\n  ARM: 6828/1: mach-ux500: delete SSP PrimeCell ID\n  ARM: 6827/1: mach-netx: delete hardcoded periphid\n  ARM: 6940/1: fiq: Briefly document driver responsibilities for suspend/resume\n  ARM: 6938/1: fiq: Refactor {get,set}_fiq_regs() for Thumb-2\n  ARM: 6914/1: sparsemem: fix highmem detection when using SPARSEMEM\n  ARM: 6913/1: sparsemem: allow pfn_valid to be overridden when using SPARSEMEM\n  at91: drop at572d940hf support\n  at91rm9200: introduce at91rm9200_set_type to specficy cpu package\n  at91: drop boot_params and PLAT_PHYS_OFFSET\n  ...\n"
    },
    {
      "commit": "246e87a9393448c20873bc5dee64be68ed559e24",
      "tree": "a17016142b267fcba2e3be9908f8138c8dcb3f3a",
      "parents": [
        "889976dbcb1218119fdd950fb7819084e37d7d37"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Thu May 26 16:25:34 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 17:12:35 2011 -0700"
      },
      "message": "memcg: fix get_scan_count() for small targets\n\nDuring memory reclaim we determine the number of pages to be scanned per\nzone as\n\n\t(anon + file) \u003e\u003e priority.\nAssume\n\tscan \u003d (anon + file) \u003e\u003e priority.\n\nIf scan \u003c SWAP_CLUSTER_MAX, the scan will be skipped for this time and\npriority gets higher.  This has some problems.\n\n  1. This increases priority as 1 without any scan.\n     To do scan in this priority, amount of pages should be larger than 512M.\n     If pages\u003e\u003epriority \u003c SWAP_CLUSTER_MAX, it\u0027s recorded and scan will be\n     batched, later. (But we lose 1 priority.)\n     If memory size is below 16M, pages \u003e\u003e priority is 0 and no scan in\n     DEF_PRIORITY forever.\n\n  2. If zone-\u003eall_unreclaimabe\u003d\u003dtrue, it\u0027s scanned only when priority\u003d\u003d0.\n     So, x86\u0027s ZONE_DMA will never be recoverred until the user of pages\n     frees memory by itself.\n\n  3. With memcg, the limit of memory can be small. When using small memcg,\n     it gets priority \u003c DEF_PRIORITY-2 very easily and need to call\n     wait_iff_congested().\n     For doing scan before priorty\u003d9, 64MB of memory should be used.\n\nThen, this patch tries to scan SWAP_CLUSTER_MAX of pages in force...when\n\n  1. the target is enough small.\n  2. it\u0027s kswapd or memcg reclaim.\n\nThen we can avoid rapid priority drop and may be able to recover\nall_unreclaimable in a small zones.  And this patch removes nr_saved_scan.\n This will allow scanning in this priority even when pages \u003e\u003e priority is\nvery small.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Ying Han \u003cyinghan@google.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "7b7bf499f79de3f6c85a340c8453a78789523f85",
      "tree": "1d0bf7ae8d5befe135fb7e7cfc455656a0ec7b34",
      "parents": [
        "4db70f73e56961b9bcdfd0c36c62847a18b7dbb5"
      ],
      "author": {
        "name": "Will Deacon",
        "email": "will.deacon@arm.com",
        "time": "Thu May 19 13:21:14 2011 +0100"
      },
      "committer": {
        "name": "Russell King",
        "email": "rmk+kernel@arm.linux.org.uk",
        "time": "Thu May 26 10:23:24 2011 +0100"
      },
      "message": "ARM: 6913/1: sparsemem: allow pfn_valid to be overridden when using SPARSEMEM\n\nIn commit eb33575c (\"[ARM] Double check memmap is actually valid with a\nmemmap has unexpected holes V2\"), a new function, memmap_valid_within,\nwas introduced to mmzone.h so that holes in the memmap which pass\npfn_valid in SPARSEMEM configurations can be detected and avoided.\n\nThe fix to this problem checks that the pfn \u003c-\u003e page linkages are\ncorrect by calculating the page for the pfn and then checking that\npage_to_pfn on that page returns the original pfn. Unfortunately, in\nSPARSEMEM configurations, this results in reading from the page flags to\ndetermine the correct section. Since the memmap here has been freed,\njunk is read from memory and the check is no longer robust.\n\nIn the best case, reading from /proc/pagetypeinfo will give you the\nwrong answer. In the worst case, you get SEGVs, Kernel OOPses and hung\nCPUs. Furthermore, ioremap implementations that use pfn_valid to\ndisallow the remapping of normal memory will break.\n\nThis patch allows architectures to provide their own pfn_valid function\ninstead of using the default implementation used by sparsemem. The\narchitecture-specific version is aware of the memmap state and will\nreturn false when passed a pfn for a freed page within a valid section.\n\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nAcked-by: Catalin Marinas \u003ccatalin.marinas@arm.com\u003e\nTested-by: H Hartley Sweeten \u003chsweeten@visionengravers.com\u003e\nSigned-off-by: Will Deacon \u003cwill.deacon@arm.com\u003e\nSigned-off-by: Russell King \u003crmk+kernel@arm.linux.org.uk\u003e\n"
    },
    {
      "commit": "a539f3533b78e39a22723d6d3e1e11b6c14454d9",
      "tree": "59c62d883a2f38e79a5e37d114c4560443728426",
      "parents": [
        "a2c8990aed5ab000491732b07c8c4465d1b389b8"
      ],
      "author": {
        "name": "Daniel Kiper",
        "email": "dkiper@net-space.pl",
        "time": "Tue May 24 17:12:51 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:36 2011 -0700"
      },
      "message": "mm: add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro\n\nAdd SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro which aligns given\npfn to upper section and lower section boundary accordingly.\n\nRequired for the latest memory hotplug support for the Xen balloon driver.\n\nSigned-off-by: Daniel Kiper \u003cdkiper@net-space.pl\u003e\nReviewed-by: Konrad Rzeszutek Wilk \u003ckonrad.wilk@oracle.com\u003e\nDavid Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e3c40f379a144f35e53864a2cd970e238071afd7",
      "tree": "6636214fe729d2ee08780a44ab92bed66c0074db",
      "parents": [
        "bf4e8902ee5080f5d2c810b639e7e778c8082b52"
      ],
      "author": {
        "name": "Daniel Kiper",
        "email": "dkiper@net-space.pl",
        "time": "Tue May 24 17:12:33 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:29 2011 -0700"
      },
      "message": "mm: pfn_to_section_nr()/section_nr_to_pfn() is valid only in CONFIG_SPARSEMEM context\n\npfn_to_section_nr()/section_nr_to_pfn() is valid only in CONFIG_SPARSEMEM\ncontext.  Move it to proper place.\n\nSigned-off-by: Daniel Kiper \u003cdkiper@net-space.pl\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0a9d59a2461477bd9ed143c01af9df3f8f00fa81",
      "tree": "df997d1cfb0786427a0df1fbd6f0640fa4248cf4",
      "parents": [
        "a23ce6da9677d245aa0aadc99f4197030350ab54",
        "795abaf1e4e188c4171e3cd3dbb11a9fcacaf505"
      ],
      "author": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Tue Feb 15 10:24:31 2011 +0100"
      },
      "committer": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Tue Feb 15 10:24:31 2011 +0100"
      },
      "message": "Merge branch \u0027master\u0027 into for-next\n"
    },
    {
      "commit": "25a64ec1e7d0cfe172832d06a31215d458dfea7f",
      "tree": "d2ed524b05bcf76e3f87bef8e6f78aed486ee30f",
      "parents": [
        "8e572bab39c484cdf512715f98626337f25cfc32"
      ],
      "author": {
        "name": "Pete Zaitcev",
        "email": "zaitcev@kotori.zaitcev.us",
        "time": "Thu Feb 03 22:43:48 2011 -0700"
      },
      "committer": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Fri Feb 04 10:55:44 2011 +0100"
      },
      "message": "fix comment spelling becausse \u003d\u003e because\n\nSigned-off-by: Pete Zaitcev \u003czaitcev@redhat.com\u003e\nSigned-off-by: Jiri Kosina \u003cjkosina@suse.cz\u003e\n"
    },
    {
      "commit": "79134171df238171daa4c024a42b77b401ccb00b",
      "tree": "af7872d5851e371d09b9fe7eb80f4809713c79fb",
      "parents": [
        "b9bbfbe30ae088cc88a4b2ba7732baeebd1a0162"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:46:58 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:43 2011 -0800"
      },
      "message": "thp: transparent hugepage vmstat\n\nAdd hugepage stat information to /proc/vmstat and /proc/meminfo.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "9950474883e027e6e728cbcff25f7f2bf0c96530",
      "tree": "ecfdd3e68a25f1ef7822428c44f8375efbe9bc0c",
      "parents": [
        "c585a2678d83ba8fb02fa6b197de0ac7d67377f1"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:20 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:37 2011 -0800"
      },
      "message": "mm: kswapd: stop high-order balancing when any suitable zone is balanced\n\nSimon Kirby reported the following problem\n\n   We\u0027re seeing cases on a number of servers where cache never fully\n   grows to use all available memory.  Sometimes we see servers with 4 GB\n   of memory that never seem to have less than 1.5 GB free, even with a\n   constantly-active VM.  In some cases, these servers also swap out while\n   this happens, even though they are constantly reading the working set\n   into memory.  We have been seeing this happening for a long time; I\n   don\u0027t think it\u0027s anything recent, and it still happens on 2.6.36.\n\nAfter some debugging work by Simon, Dave Hansen and others, the prevaling\ntheory became that kswapd is reclaiming order-3 pages requested by SLUB\ntoo aggressive about it.\n\nThere are two apparent problems here.  On the target machine, there is a\nsmall Normal zone in comparison to DMA32.  As kswapd tries to balance all\nzones, it would continually try reclaiming for Normal even though DMA32\nwas balanced enough for callers.  The second problem is that\nsleeping_prematurely() does not use the same logic as balance_pgdat() when\ndeciding whether to sleep or not.  This keeps kswapd artifically awake.\n\nA number of tests were run and the figures from previous postings will\nlook very different for a few reasons.  One, the old figures were forcing\nmy network card to use GFP_ATOMIC in attempt to replicate Simon\u0027s problem.\n Second, I previous specified slub_min_order\u003d3 again in an attempt to\nreproduce Simon\u0027s problem.  In this posting, I\u0027m depending on Simon to say\nwhether his problem is fixed or not and these figures are to show the\nimpact to the ordinary cases.  Finally, the \"vmscan\" figures are taken\nfrom /proc/vmstat instead of the tracepoints.  There is less information\nbut recording is less disruptive.\n\nThe first test of relevance was postmark with a process running in the\nbackground reading a large amount of anonymous memory in blocks.  The\nobjective was to vaguely simulate what was happening on Simon\u0027s machine\nand it\u0027s memory intensive enough to have kswapd awake.\n\nPOSTMARK\n                                            traceonly          kanyzone\nTransactions per second:              156.00 ( 0.00%)   153.00 (-1.96%)\nData megabytes read per second:        21.51 ( 0.00%)    21.52 ( 0.05%)\nData megabytes written per second:     29.28 ( 0.00%)    29.11 (-0.58%)\nFiles created alone per second:       250.00 ( 0.00%)   416.00 (39.90%)\nFiles create/transact per second:      79.00 ( 0.00%)    76.00 (-3.95%)\nFiles deleted alone per second:       520.00 ( 0.00%)   420.00 (-23.81%)\nFiles delete/transact per second:      79.00 ( 0.00%)    76.00 (-3.95%)\n\nMMTests Statistics: duration\nUser/Sys Time Running Test (seconds)         16.58      17.4\nTotal Elapsed Time (seconds)                218.48    222.47\n\nVMstat Reclaim Statistics: vmscan\nDirect reclaims                                  0          4\nDirect reclaim pages scanned                     0        203\nDirect reclaim pages reclaimed                   0        184\nKswapd pages scanned                        326631     322018\nKswapd pages reclaimed                      312632     309784\nKswapd low wmark quickly                         1          4\nKswapd high wmark quickly                      122        475\nKswapd skip congestion_wait                      1          0\nPages activated                             700040     705317\nPages deactivated                           212113     203922\nPages written                                 9875       6363\n\nTotal pages scanned                         326631    322221\nTotal pages reclaimed                       312632    309968\n%age total pages scanned/reclaimed          95.71%    96.20%\n%age total pages scanned/written             3.02%     1.97%\n\nproc vmstat: Faults\nMajor Faults                                   300       254\nMinor Faults                                645183    660284\nPage ins                                    493588    486704\nPage outs                                  4960088   4986704\nSwap ins                                      1230       661\nSwap outs                                     9869      6355\n\nPerformance is mildly affected because kswapd is no longer doing as much\nwork and the background memory consumer process is getting in the way.\nNote that kswapd scanned and reclaimed fewer pages as it\u0027s less aggressive\nand overall fewer pages were scanned and reclaimed.  Swap in/out is\nparticularly reduced again reflecting kswapd throwing out fewer pages.\n\nThe slight performance impact is unfortunate here but it looks like a\ndirect result of kswapd being less aggressive.  As the bug report is about\ntoo many pages being freed by kswapd, it may have to be accepted for now.\n\nThe second test is a streaming IO benchmark that was previously used by\nJohannes to show regressions in page reclaim.\n\nMICRO\n\t\t\t\t\t traceonly  kanyzone\nUser/Sys Time Running Test (seconds)         29.29     28.87\nTotal Elapsed Time (seconds)                492.18    488.79\n\nVMstat Reclaim Statistics: vmscan\nDirect reclaims                               2128       1460\nDirect reclaim pages scanned               2284822    1496067\nDirect reclaim pages reclaimed              148919     110937\nKswapd pages scanned                      15450014   16202876\nKswapd pages reclaimed                     8503697    8537897\nKswapd low wmark quickly                      3100       3397\nKswapd high wmark quickly                     1860       7243\nKswapd skip congestion_wait                    708        801\nPages activated                               9635       9573\nPages deactivated                             1432       1271\nPages written                                  223       1130\n\nTotal pages scanned                       17734836  17698943\nTotal pages reclaimed                      8652616   8648834\n%age total pages scanned/reclaimed          48.79%    48.87%\n%age total pages scanned/written             0.00%     0.01%\n\nproc vmstat: Faults\nMajor Faults                                   165       221\nMinor Faults                               9655785   9656506\nPage ins                                      3880      7228\nPage outs                                 37692940  37480076\nSwap ins                                         0        69\nSwap outs                                       19        15\n\nAgain fewer pages are scanned and reclaimed as expected and this time the\ntest completed faster.  Note that kswapd is hitting its watermarks faster\n(low and high wmark quickly) which I expect is due to kswapd reclaiming\nfewer pages.\n\nI also ran fs-mark, iozone and sysbench but there is nothing interesting\nto report in the figures.  Performance is not significantly changed and\nthe reclaim statistics look reasonable.\n\nTgis patch:\n\nWhen the allocator enters its slow path, kswapd is woken up to balance the\nnode.  It continues working until all zones within the node are balanced.\nFor order-0 allocations, this makes perfect sense but for higher orders it\ncan have unintended side-effects.  If the zone sizes are imbalanced,\nkswapd may reclaim heavily within a smaller zone discarding an excessive\nnumber of pages.  The user-visible behaviour is that kswapd is awake and\nreclaiming even though plenty of pages are free from a suitable zone.\n\nThis patch alters the \"balance\" logic for high-order reclaim allowing\nkswapd to stop if any suitable zone becomes balanced to reduce the number\nof pages it reclaims from other zones.  kswapd still tries to ensure that\norder-0 watermarks for all zones are met before sleeping.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Eric B Munson \u003cemunson@mgebm.net\u003e\nCc: Simon Kirby \u003csim@hostway.ca\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97",
      "tree": "6f39beef8cf918eb2ca9f64ae1bcd1ea79ca487a",
      "parents": [
        "43bb40c9e3aa51a3b038c9df2c9afb4d4685614d"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:45:41 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:31 2011 -0800"
      },
      "message": "mm: page allocator: adjust the per-cpu counter threshold when memory is low\n\nCommit aa45484 (\"calculate a better estimate of NR_FREE_PAGES when memory\nis low\") noted that watermarks were based on the vmstat NR_FREE_PAGES.  To\navoid synchronization overhead, these counters are maintained on a per-cpu\nbasis and drained both periodically and when a threshold is above a\nthreshold.  On large CPU systems, the difference between the estimate and\nreal value of NR_FREE_PAGES can be very high.  The system can get into a\ncase where pages are allocated far below the min watermark potentially\ncausing livelock issues.  The commit solved the problem by taking a better\nreading of NR_FREE_PAGES when memory was low.\n\nUnfortately, as reported by Shaohua Li this accurate reading can consume a\nlarge amount of CPU time on systems with many sockets due to cache line\nbouncing.  This patch takes a different approach.  For large machines\nwhere counter drift might be unsafe and while kswapd is awake, the per-cpu\nthresholds for the target pgdat are reduced to limit the level of drift to\nwhat should be a safe level.  This incurs a performance penalty in heavy\nmemory pressure by a factor that depends on the workload and the machine\nbut the machine should function correctly without accidentally exhausting\nall memory on a node.  There is an additional cost when kswapd wakes and\nsleeps but the event is not expected to be frequent - in Shaohua\u0027s test\ncase, there was one recorded sleep and wake event at least.\n\nTo ensure that kswapd wakes up, a safe version of zone_watermark_ok() is\nintroduced that takes a more accurate reading of NR_FREE_PAGES when called\nfrom wakeup_kswapd, when deciding whether it is really safe to go back to\nsleep in sleeping_prematurely() and when deciding if a zone is really\nbalanced or not in balance_pgdat().  We are still using an expensive\nfunction but limiting how often it is called.\n\nWhen the test case is reproduced, the time spent in the watermark\nfunctions is reduced.  The following report is on the percentage of time\nspent cumulatively spent in the functions zone_nr_free_pages(),\nzone_watermark_ok(), __zone_watermark_ok(), zone_watermark_ok_safe(),\nzone_page_state_snapshot(), zone_page_state().\n\nvanilla                      11.6615%\ndisable-threshold            0.2584%\n\nDavid said:\n\n: We had to pull aa454840 \"mm: page allocator: calculate a better estimate\n: of NR_FREE_PAGES when memory is low and kswapd is awake\" from 2.6.36\n: internally because tests showed that it would cause the machine to stall\n: as the result of heavy kswapd activity.  I merged it back with this fix as\n: it is pending in the -mm tree and it solves the issue we were seeing, so I\n: definitely think this should be pushed to -stable (and I would seriously\n: consider it for 2.6.37 inclusion even at this late date).\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReported-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux.com\u003e\nTested-by: Nicolas Bareil \u003cnico@chdir.org\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Kyle McMartin \u003ckyle@mcmartin.ca\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.37.1, 2.6.36.x]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0e093d99763eb4cea09f8ca4f1d01f34e121d10b",
      "tree": "fad38f9c3651c81db298521141a79d9468f71986",
      "parents": [
        "08fc468f4eaf6683bae5bdb94743a09d8630cb80"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 26 14:21:45 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone\n\nIf congestion_wait() is called with no BDI congested, the caller will\nsleep for the full timeout and this may be an unnecessary sleep.  This\npatch adds a wait_iff_congested() that checks congestion and only sleeps\nif a BDI is congested else, it calls cond_resched() to ensure the caller\nis not hogging the CPU longer than its quota but otherwise will not sleep.\n\nThis is aimed at reducing some of the major desktop stalls reported during\nIO.  For example, while kswapd is operating, it calls congestion_wait()\nbut it could just have been reclaiming clean page cache pages with no\ncongestion.  Without this patch, it would sleep for a full timeout but\nafter this patch, it\u0027ll just call schedule() if it has been on the CPU too\nlong.  Similar logic applies to direct reclaimers that are not making\nenough progress.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Jens Axboe \u003caxboe@kernel.dk\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ea941f0e2a8c02ae876cd73deb4e1557248f258c",
      "tree": "d2006c10cce4f134dc83f7f5aaa1d0096902cc1a",
      "parents": [
        "f629d1c9bd0dbc44a6c4f9a4a67d1646c42bfc6f"
      ],
      "author": {
        "name": "Michael Rubin",
        "email": "mrubin@google.com",
        "time": "Tue Oct 26 14:21:35 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:06 2010 -0700"
      },
      "message": "writeback: add nr_dirtied and nr_written to /proc/vmstat\n\nTo help developers and applications gain visibility into writeback\nbehaviour adding two entries to vm_stat_items and /proc/vmstat.  This will\nallow us to track the \"written\" and \"dirtied\" counts.\n\n   # grep nr_dirtied /proc/vmstat\n   nr_dirtied 3747\n   # grep nr_written /proc/vmstat\n   nr_written 3618\n\nSigned-off-by: Michael Rubin \u003cmrubin@google.com\u003e\nReviewed-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Jens Axboe \u003caxboe@kernel.dk\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "aa45484031ddee09b06350ab8528bfe5b2c76d1c",
      "tree": "6758072232db9a54453022ec3e6cede35d52001c",
      "parents": [
        "72853e2991a2702ae93aaf889ac7db743a415dd3"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "cl@linux.com",
        "time": "Thu Sep 09 16:38:17 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Sep 09 18:57:25 2010 -0700"
      },
      "message": "mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake\n\nOrdinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is\ncheaper than scanning a number of lists.  To avoid synchronization\noverhead, counter deltas are maintained on a per-cpu basis and drained\nboth periodically and when the delta is above a threshold.  On large CPU\nsystems, the difference between the estimated and real value of\nNR_FREE_PAGES can be very high.  If NR_FREE_PAGES is much higher than\nnumber of real free page in buddy, the VM can allocate pages below min\nwatermark, at worst reducing the real number of pages to zero.  Even if\nthe OOM killer kills some victim for freeing memory, it may not free\nmemory if the exit path requires a new page resulting in livelock.\n\nThis patch introduces a zone_page_state_snapshot() function (courtesy of\nChristoph) that takes a slightly more accurate view of an arbitrary vmstat\ncounter.  It is used to read NR_FREE_PAGES while kswapd is awake to avoid\nthe watermark being accidentally broken.  The estimate is not perfect and\nmay result in cache line bounces but is expected to be lighter than the\nIPI calls necessary to continually drain the per-cpu counters while kswapd\nis awake.\n\nSigned-off-by: Christoph Lameter \u003ccl@linux.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "25edde0332916ae706ccf83de688be57bcc844b7",
      "tree": "35a5b0e651f9cdb48d9a55a748970339c4f681bc",
      "parents": [
        "b898cc70019ce1835bbf6c47bdf978adc36faa42"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Mon Aug 09 17:19:27 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Aug 09 20:45:00 2010 -0700"
      },
      "message": "vmscan: kill prev_priority completely\n\nSince 2.6.28 zone-\u003eprev_priority is unused. Then it can be removed\nsafely. It reduce stack usage slightly.\n\nNow I have to say that I\u0027m sorry. 2 years ago, I thought prev_priority\ncan be integrate again, it\u0027s useful. but four (or more) times trying\nhaven\u0027t got good performance number. Thus I give up such approach.\n\nThe rest of this changelog is notes on prev_priority and why it existed in\nthe first place and why it might be not necessary any more. This information\nis based heavily on discussions between Andrew Morton, Rik van Riel and\nKosaki Motohiro who is heavily quotes from.\n\nHistorically prev_priority was important because it determined when the VM\nwould start unmapping PTE pages. i.e. there are no balances of note within\nthe VM, Anon vs File and Mapped vs Unmapped. Without prev_priority, there\nis a potential risk of unnecessarily increasing minor faults as a large\namount of read activity of use-once pages could push mapped pages to the\nend of the LRU and get unmapped.\n\nThere is no proof this is still a problem but currently it is not considered\nto be. Active files are not deactivated if the active file list is smaller\nthan the inactive list reducing the liklihood that file-mapped pages are\nbeing pushed off the LRU and referenced executable pages are kept on the\nactive list to avoid them getting pushed out by read activity.\n\nEven if it is a problem, prev_priority prev_priority wouldn\u0027t works\nnowadays. First of all, current vmscan still a lot of UP centric code. it\nexpose some weakness on some dozens CPUs machine. I think we need more and\nmore improvement.\n\nThe problem is, current vmscan mix up per-system-pressure, per-zone-pressure\nand per-task-pressure a bit. example, prev_priority try to boost priority to\nother concurrent priority. but if the another task have mempolicy restriction,\nit is unnecessary, but also makes wrong big latency and exceeding reclaim.\nper-task based priority + prev_priority adjustment make the emulation of\nper-system pressure. but it have two issue 1) too rough and brutal emulation\n2) we need per-zone pressure, not per-system.\n\nAnother example, currently DEF_PRIORITY is 12. it mean the lru rotate about\n2 cycle (1/4096 + 1/2048 + 1/1024 + .. + 1) before invoking OOM-Killer.\nbut if 10,0000 thrreads enter DEF_PRIORITY reclaim at the same time, the\nsystem have higher memory pressure than priority\u003d\u003d0 (1/4096*10,000 \u003e 2).\nprev_priority can\u0027t solve such multithreads workload issue. In other word,\nprev_priority concept assume the sysmtem don\u0027t have lots threads.\"\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Nick Piggin \u003cnpiggin@suse.de\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: Michael Rubin \u003cmrubin@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b645bd1286f2fbcd2eb4ab3bed5884f63c42e363",
      "tree": "7649eb3fbe4afeb01e9403e71b0546a37406a33e",
      "parents": [
        "31f961a89bd1cb9baaf32af4bd8b571ace3447b1"
      ],
      "author": {
        "name": "Alexander Nevenchannyy",
        "email": "a.nevenchannyy@gmail.com",
        "time": "Mon Aug 09 17:19:00 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Aug 09 20:44:57 2010 -0700"
      },
      "message": "mmzone.h: remove dead prototype\n\nget_zone_counts() was dropped from kernel tree, see:\nhttp://www.mail-archive.com/mm-commits@vger.kernel.org/msg07313.html but\nits prototype remains.\n\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "7aac789885512388a66d47280d7e7777ffba1e59",
      "tree": "af4ac98260268889a422dd264102d2f15d5c1983",
      "parents": [
        "3bccd996276b108c138e8176793a26ecef54d573"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "lee.schermerhorn@hp.com",
        "time": "Wed May 26 14:45:00 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 27 09:12:57 2010 -0700"
      },
      "message": "numa: introduce numa_mem_id()- effective local memory node id\n\nIntroduce numa_mem_id(), based on generic percpu variable infrastructure\nto track \"nearest node with memory\" for archs that support memoryless\nnodes.\n\nDefine API in \u003clinux/topology.h\u003e when CONFIG_HAVE_MEMORYLESS_NODES\ndefined, else stubs.  Architectures will define HAVE_MEMORYLESS_NODES\nif/when they support them.\n\nArchs can override definitions of:\n\nnuma_mem_id() - returns node number of \"local memory\" node\nset_numa_mem() - initialize [this cpus\u0027] per cpu variable \u0027numa_mem\u0027\ncpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue\n\nGeneric initialization of \u0027numa_mem\u0027 occurs in __build_all_zonelists().\nThis will initialize the boot cpu at boot time, and all cpus on change of\nnuma_zonelist_order, or when node or memory hot-plug requires zonelist\nrebuild.  Archs that support memoryless nodes will need to initialize\n\u0027numa_mem\u0027 for secondary cpus as they\u0027re brought on-line.\n\n[akpm@linux-foundation.org: fix build]\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Tejun Heo \u003ctj@kernel.org\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Nick Piggin \u003cnpiggin@suse.de\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Eric Whitney \u003ceric.whitney@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Ingo Molnar \u003cmingo@elte.hu\u003e\nCc: Thomas Gleixner \u003ctglx@linutronix.de\u003e\nCc: \"H. Peter Anvin\" \u003chpa@zytor.com\u003e\nCc: \"Luck, Tony\" \u003ctony.luck@intel.com\u003e\nCc: Pekka Enberg \u003cpenberg@cs.helsinki.fi\u003e\nCc: \u003clinux-arch@vger.kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4eaf3f64397c3db3c5785eee508270d62a9fabd9",
      "tree": "bfd986a7e974876755ea6fe0de394199c68e2e36",
      "parents": [
        "1f522509c77a5dea8dc384b735314f03908a6415"
      ],
      "author": {
        "name": "Haicheng Li",
        "email": "haicheng.li@linux.intel.com",
        "time": "Mon May 24 14:32:52 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue May 25 08:07:02 2010 -0700"
      },
      "message": "mem-hotplug: fix potential race while building zonelist for new populated zone\n\nAdd global mutex zonelists_mutex to fix the possible race:\n\n     CPU0                                  CPU1                    CPU2\n(1) zone-\u003epresent_pages +\u003d online_pages;\n(2)                                       build_all_zonelists();\n(3)                                                               alloc_page();\n(4)                                                               free_page();\n(5) build_all_zonelists();\n(6)   __build_all_zonelists();\n(7)     zone-\u003epageset \u003d alloc_percpu();\n\nIn step (3,4), zone-\u003epageset still points to boot_pageset, so bad\nthings may happen if 2+ nodes are in this state. Even if only 1 node\nis accessing the boot_pageset, (3) may still consume too much memory\nto fail the memory allocations in step (7).\n\nBesides, atomic operation ensures alloc_percpu() in step (7) will never fail\nsince there is a new fresh memory block added in step(6).\n\n[haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]\nSigned-off-by: Haicheng Li \u003chaicheng.li@linux.intel.com\u003e\nSigned-off-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nReviewed-by: Andi Kleen \u003candi.kleen@intel.com\u003e\nCc: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Tejun Heo \u003ctj@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1f522509c77a5dea8dc384b735314f03908a6415",
      "tree": "4b848527b90877a8a64c46e8e2d76723405c319d",
      "parents": [
        "319774e25fa4b7641bdc3b0a464dd84e62103347"
      ],
      "author": {
        "name": "Haicheng Li",
        "email": "haicheng.li@linux.intel.com",
        "time": "Mon May 24 14:32:51 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue May 25 08:07:01 2010 -0700"
      },
      "message": "mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset\n\nFor each new populated zone of hotadded node, need to update its pagesets\nwith dynamically allocated per_cpu_pageset struct for all possible CPUs:\n\n    1) Detach zone-\u003epageset from the shared boot_pageset\n       at end of __build_all_zonelists().\n\n    2) Use mutex to protect zone-\u003epageset when it\u0027s still\n       shared in onlined_pages()\n\nOtherwises, multiple zones of different nodes would share same boot strapping\nboot_pageset for same CPU, which will finally cause below kernel panic:\n\n  ------------[ cut here ]------------\n  kernel BUG at mm/page_alloc.c:1239!\n  invalid opcode: 0000 [#1] SMP\n  ...\n  Call Trace:\n   [\u003cffffffff811300c1\u003e] __alloc_pages_nodemask+0x131/0x7b0\n   [\u003cffffffff81162e67\u003e] alloc_pages_current+0x87/0xd0\n   [\u003cffffffff81128407\u003e] __page_cache_alloc+0x67/0x70\n   [\u003cffffffff811325f0\u003e] __do_page_cache_readahead+0x120/0x260\n   [\u003cffffffff81132751\u003e] ra_submit+0x21/0x30\n   [\u003cffffffff811329c6\u003e] ondemand_readahead+0x166/0x2c0\n   [\u003cffffffff81132ba0\u003e] page_cache_async_readahead+0x80/0xa0\n   [\u003cffffffff8112a0e4\u003e] generic_file_aio_read+0x364/0x670\n   [\u003cffffffff81266cfa\u003e] nfs_file_read+0xca/0x130\n   [\u003cffffffff8117b20a\u003e] do_sync_read+0xfa/0x140\n   [\u003cffffffff8117bf75\u003e] vfs_read+0xb5/0x1a0\n   [\u003cffffffff8117c151\u003e] sys_read+0x51/0x80\n   [\u003cffffffff8103c032\u003e] system_call_fastpath+0x16/0x1b\n  RIP  [\u003cffffffff8112ff13\u003e] get_page_from_freelist+0x883/0x900\n   RSP \u003cffff88000d1e78a8\u003e\n  ---[ end trace 4bda28328b9990db ]\n\n[akpm@linux-foundation.org: merge fix]\nSigned-off-by: Haicheng Li \u003chaicheng.li@linux.intel.com\u003e\nSigned-off-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nReviewed-by: Andi Kleen \u003candi.kleen@intel.com\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Tejun Heo \u003ctj@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0faa56389c793cda7f967117415717bbab24fe4e",
      "tree": "b0d5f12579a4448adff2b6e462488f3cc6d75326",
      "parents": [
        "ff3d58c22b6827039983911d3460cf0c1657f8cc"
      ],
      "author": {
        "name": "Marcelo Roberto Jimenez",
        "email": "mroberto@cpti.cetuc.puc-rio.br",
        "time": "Mon May 24 14:32:47 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue May 25 08:07:01 2010 -0700"
      },
      "message": "mm: fix NR_SECTION_ROOTS \u003d\u003d 0 when using using sparsemem extreme.\n\nGot this while compiling for ARM/SA1100:\n\nmm/sparse.c: In function \u0027__section_nr\u0027:\nmm/sparse.c:135: warning: \u0027root\u0027 is used uninitialized in this function\n\nThis patch follows Russell King\u0027s suggestion for a new calculation for\nNR_SECTION_ROOTS.  Thanks also to Sergei Shtylyov for pointing out the\nexistence of the macro DIV_ROUND_UP.\n\nAtsushi Nemoto observed:\n: This fix doesn\u0027t just silence the warning - it fixes a real problem.\n:\n: Without this fix, mem_section[] might have 0 size so mem_section[0]\n: will share other variable area.  For example, I got:\n:\n: c030c700 b __warned.16478\n: c030c700 B mem_section\n: c030c701 b __warned.16483\n:\n: This might cause very strange behavior.  Your patch actually fixes it.\n\nSigned-off-by: Marcelo Roberto Jimenez \u003cmroberto@cpti.cetuc.puc-rio.br\u003e\nCc: Atsushi Nemoto \u003canemo@mba.ocn.ne.jp\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Yinghai Lu \u003cyinghai@kernel.org\u003e\nCc: Sergei Shtylyov \u003csshtylyov@mvista.com\u003e\nCc: Russell King \u003crmk@arm.linux.org.uk\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4f92e2586b43a2402e116055d4edda704f911b5b",
      "tree": "6a765ebeba951c02a7878bcea52a4769ad2e45c2",
      "parents": [
        "5e7719058079a1423ccce56148b0aaa56b2df821"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon May 24 14:32:32 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue May 25 08:07:00 2010 -0700"
      },
      "message": "mm: compaction: defer compaction using an exponential backoff when compaction fails\n\nThe fragmentation index may indicate that a failure is due to external\nfragmentation but after a compaction run completes, it is still possible\nfor an allocation to fail.  There are two obvious reasons as to why\n\n  o Page migration cannot move all pages so fragmentation remains\n  o A suitable page may exist but watermarks are not met\n\nIn the event of compaction followed by an allocation failure, this patch\ndefers further compaction in the zone (1 \u003c\u003c compact_defer_shift) times.\nIf the next compaction attempt also fails, compact_defer_shift is\nincreased up to a maximum of 6.  If compaction succeeds, the defer\ncounters are reset again.\n\nThe zone that is deferred is the first zone in the zonelist - i.e.  the\npreferred zone.  To defer compaction in the other zones, the information\nwould need to be stored in the zonelist or implemented similar to the\nzonelist_cache.  This would impact the fast-paths and is not justified at\nthis time.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "318ae2edc3b29216abd8a2510f3f80b764f06858",
      "tree": "ce595adde342f57f379d277b25e4dd206988a052",
      "parents": [
        "25cf84cf377c0aae5dbcf937ea89bc7893db5176",
        "3e58974027b04e84f68b964ef368a6cd758e2f84"
      ],
      "author": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Mon Mar 08 16:55:37 2010 +0100"
      },
      "committer": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Mon Mar 08 16:55:37 2010 +0100"
      },
      "message": "Merge branch \u0027for-next\u0027 into for-linus\n\nConflicts:\n\tDocumentation/filesystems/proc.txt\n\tarch/arm/mach-u300/include/mach/debug-macro.S\n\tdrivers/net/qlge/qlge_ethtool.c\n\tdrivers/net/qlge/qlge_main.c\n\tdrivers/net/typhoon.c\n"
    },
    {
      "commit": "93e4a89a8c987189b168a530a331ef6d0fcf07a7",
      "tree": "deb08017c0e4874539549d3ea9bf2d7b447a43be",
      "parents": [
        "fc91668eaf9e7ba61e867fc2218b7e9fb67faa4f"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Fri Mar 05 13:41:55 2010 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sat Mar 06 11:26:25 2010 -0800"
      },
      "message": "mm: restore zone-\u003eall_unreclaimable to independence word\n\ncommit e815af95 (\"change all_unreclaimable zone member to flags\") changed\nall_unreclaimable member to bit flag.  But it had an undesireble side\neffect.  free_one_page() is one of most hot path in linux kernel and\nincreasing atomic ops in it can reduce kernel performance a bit.\n\nThus, this patch revert such commit partially. at least\nall_unreclaimable shouldn\u0027t share memory word with other zone flags.\n\n[akpm@linux-foundation.org: fix patch interaction]\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Huang Shijie \u003cshijie8@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a626b46e17d0762d664ce471d40bc506b6e721ab",
      "tree": "445f6ac655ea9247d2e27529f23ba02d0991fec0",
      "parents": [
        "c1dcb4bb1e3e16e9baee578d9bb040e5fba1063e",
        "dce46a04d55d6358d2d4ab44a4946a19f9425fe2"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 03 08:15:05 2010 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Mar 03 08:15:05 2010 -0800"
      },
      "message": "Merge branch \u0027x86-bootmem-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip\n\n* \u0027x86-bootmem-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)\n  early_res: Need to save the allocation name in drop_range_partial()\n  sparsemem: Fix compilation on PowerPC\n  early_res: Add free_early_partial()\n  x86: Fix non-bootmem compilation on PowerPC\n  core: Move early_res from arch/x86 to kernel/\n  x86: Add find_fw_memmap_area\n  Move round_up/down to kernel.h\n  x86: Make 32bit support NO_BOOTMEM\n  early_res: Enhance check_and_double_early_res\n  x86: Move back find_e820_area to e820.c\n  x86: Add find_early_area_size\n  x86: Separate early_res related code from e820.c\n  x86: Move bios page reserve early to head32/64.c\n  sparsemem: Put mem map for one node together.\n  sparsemem: Put usemap for one node together\n  x86: Make 64 bit use early_res instead of bootmem before slab\n  x86: Only call dma32_reserve_bootmem 64bit !CONFIG_NUMA\n  x86: Make early_node_mem get mem \u003e 4 GB if possible\n  x86: Dynamically increase early_res array size\n  x86: Introduce max_early_res and early_res_count\n  ...\n"
    },
    {
      "commit": "43cf38eb5cea91245502df3fcee4dbfc1c74dd1c",
      "tree": "a58ea87af1f07b8aed4941db074f44103f321f6e",
      "parents": [
        "ab386128f20c44c458a90039ab1bdc265ac474c9"
      ],
      "author": {
        "name": "Tejun Heo",
        "email": "tj@kernel.org",
        "time": "Tue Feb 02 14:38:57 2010 +0900"
      },
      "committer": {
        "name": "Tejun Heo",
        "email": "tj@kernel.org",
        "time": "Wed Feb 17 11:17:38 2010 +0900"
      },
      "message": "percpu: add __percpu sparse annotations to core kernel subsystems\n\nAdd __percpu sparse annotations to core subsystems.\n\nThese annotations are to make sparse consider percpu variables to be\nin a different address space and warn if accessed without going\nthrough percpu accessors.  This patch doesn\u0027t affect normal builds.\n\nSigned-off-by: Tejun Heo \u003ctj@kernel.org\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nAcked-by: Paul E. McKenney \u003cpaulmck@linux.vnet.ibm.com\u003e\nCc: Jens Axboe \u003caxboe@kernel.dk\u003e\nCc: linux-mm@kvack.org\nCc: Rusty Russell \u003crusty@rustcorp.com.au\u003e\nCc: Dipankar Sarma \u003cdipankar@in.ibm.com\u003e\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nCc: Eric Biederman \u003cebiederm@xmission.com\u003e\n"
    },
    {
      "commit": "08677214e318297f228237be0042aac754f48f1d",
      "tree": "6d03424f7e287fcf66136b44512328afb1aeee49",
      "parents": [
        "c252a5bb1f57afb1e336d68085217727ca7b2134"
      ],
      "author": {
        "name": "Yinghai Lu",
        "email": "yinghai@kernel.org",
        "time": "Wed Feb 10 01:20:20 2010 -0800"
      },
      "committer": {
        "name": "H. Peter Anvin",
        "email": "hpa@zytor.com",
        "time": "Fri Feb 12 09:41:59 2010 -0800"
      },
      "message": "x86: Make 64 bit use early_res instead of bootmem before slab\n\nFinally we can use early_res to replace bootmem for x86_64 now.\n\nStill can use CONFIG_NO_BOOTMEM to enable it or not.\n\n-v2: fix 32bit compiling about MAX_DMA32_PFN\n-v3: folded bug fix from LKML message below\n\nSigned-off-by: Yinghai Lu \u003cyinghai@kernel.org\u003e\nLKML-Reference: \u003c4B747239.4070907@kernel.org\u003e\nSigned-off-by: H. Peter Anvin \u003chpa@zytor.com\u003e\n"
    },
    {
      "commit": "2a61aa401638529cd4231f6106980d307fba98fa",
      "tree": "a3d7565570c5996d0b3ae5fdf0126e065e750431",
      "parents": [
        "c41b20e721ea4f6f20f66a66e7f0c3c97a2ca9c2"
      ],
      "author": {
        "name": "Adam Buchbinder",
        "email": "adam.buchbinder@gmail.com",
        "time": "Fri Dec 11 16:35:40 2009 -0500"
      },
      "committer": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Thu Feb 04 11:55:45 2010 +0100"
      },
      "message": "Fix misspellings of \"invocation\" in comments.\n\nSome comments misspell \"invocation\"; this fixes them. No code\nchanges.\n\nSigned-off-by: Adam Buchbinder \u003cadam.buchbinder@gmail.com\u003e\nSigned-off-by: Jiri Kosina \u003cjkosina@suse.cz\u003e\n"
    },
    {
      "commit": "99dcc3e5a94ed491fbef402831d8c0bbb267f995",
      "tree": "dd4d2b9e10ab0d4502e4b2a22dfc0a02a3300d7e",
      "parents": [
        "5917dae83cb02dfe74c9167b79e86e6d65183fa3"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "cl@linux-foundation.org",
        "time": "Tue Jan 05 15:34:51 2010 +0900"
      },
      "committer": {
        "name": "Tejun Heo",
        "email": "tj@kernel.org",
        "time": "Tue Jan 05 15:34:51 2010 +0900"
      },
      "message": "this_cpu: Page allocator conversion\n\nUse the per cpu allocator functionality to avoid per cpu arrays in struct zone.\n\nThis drastically reduces the size of struct zone for systems with large\namounts of processors and allows placement of critical variables of struct\nzone in one cacheline even on very large systems.\n\nAnother effect is that the pagesets of one processor are placed near one\nanother. If multiple pagesets from different zones fit into one cacheline\nthen additional cacheline fetches can be avoided on the hot paths when\nallocating memory from multiple zones.\n\nBootstrap becomes simpler if we use the same scheme for UP, SMP, NUMA. #ifdefs\nare reduced and we can drop the zone_pcp macro.\n\nHotplug handling is also simplified since cpu alloc can bring up and\nshut down cpu areas for a specific cpu as a whole. So there is no need to\nallocate or free individual pagesets.\n\nV7-V8:\n- Explain chicken egg dilemmna with percpu allocator.\n\nV4-V5:\n- Fix up cases where per_cpu_ptr is called before irq disable\n- Integrate the bootstrap logic that was separate before.\n\ntj: Build failure in pageset_cpuup_callback() due to missing ret\n    variable fixed.\n\nReviewed-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nSigned-off-by: Tejun Heo \u003ctj@kernel.org\u003e\n"
    },
    {
      "commit": "01fc0ac198eabcbf460e1ed058860a935b6c2c9a",
      "tree": "f980b4c770298bf9491dcfe3f02359fa94b89d04",
      "parents": [
        "9367858dd08caf4e6ebd511abd2fca0a2d87b648"
      ],
      "author": {
        "name": "Sam Ravnborg",
        "email": "sam@ravnborg.org",
        "time": "Sun Apr 19 21:57:19 2009 +0200"
      },
      "committer": {
        "name": "Michal Marek",
        "email": "mmarek@suse.cz",
        "time": "Sat Dec 12 13:08:14 2009 +0100"
      },
      "message": "kbuild: move bounds.h to include/generated\n\nSigned-off-by: Sam Ravnborg \u003csam@ravnborg.org\u003e\nCc: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\nSigned-off-by: Michal Marek \u003cmmarek@suse.cz\u003e\n"
    },
    {
      "commit": "8d65af789f3e2cf4cfbdbf71a0f7a61ebcd41d38",
      "tree": "121df3bfffc7853ac6d2c514ad514d4a748a0933",
      "parents": [
        "c0d0787b6d47d9f4d5e8bd321921104e854a9135"
      ],
      "author": {
        "name": "Alexey Dobriyan",
        "email": "adobriyan@gmail.com",
        "time": "Wed Sep 23 15:57:19 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Sep 24 07:21:04 2009 -0700"
      },
      "message": "sysctl: remove \"struct file *\" argument of -\u003eproc_handler\n\nIt\u0027s unused.\n\nIt isn\u0027t needed -- read or write flag is already passed and sysctl\nshouldn\u0027t care about the rest.\n\nIt _was_ used in two places at arch/frv for some reason.\n\nSigned-off-by: Alexey Dobriyan \u003cadobriyan@gmail.com\u003e\nCc: David Howells \u003cdhowells@redhat.com\u003e\nCc: \"Eric W. Biederman\" \u003cebiederm@xmission.com\u003e\nCc: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\nCc: Ralf Baechle \u003cralf@linux-mips.org\u003e\nCc: Martin Schwidefsky \u003cschwidefsky@de.ibm.com\u003e\nCc: Ingo Molnar \u003cmingo@elte.hu\u003e\nCc: \"David S. Miller\" \u003cdavem@davemloft.net\u003e\nCc: James Morris \u003cjmorris@namei.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5f8dcc21211a3d4e3a7a5ca366b469fb88117f61",
      "tree": "4bbb1b55c7787462fe313c7c003e77823c032422",
      "parents": [
        "5d863b89688e5811cd9e5bd0082cb38abe03adf3"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Sep 21 17:03:19 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Sep 22 07:17:39 2009 -0700"
      },
      "message": "page-allocator: split per-cpu list into one-list-per-migrate-type\n\nThe following two patches remove searching in the page allocator fast-path\nby maintaining multiple free-lists in the per-cpu structure.  At the time\nthe search was introduced, increasing the per-cpu structures would waste a\nlot of memory as per-cpu structures were statically allocated at\ncompile-time.  This is no longer the case.\n\nThe patches are as follows. They are based on mmotm-2009-08-27.\n\nPatch 1 adds multiple lists to struct per_cpu_pages, one per\n\tmigratetype that can be stored on the PCP lists.\n\nPatch 2 notes that the pcpu drain path check empty lists multiple times. The\n\tpatch reduces the number of checks by maintaining a count of free\n\tlists encountered. Lists containing pages will then free multiple\n\tpages in batch\n\nThe patches were tested with kernbench, netperf udp/tcp, hackbench and\nsysbench.  The netperf tests were not bound to any CPU in particular and\nwere run such that the results should be 99% confidence that the reported\nresults are within 1% of the estimated mean.  sysbench was run with a\npostgres background and read-only tests.  Similar to netperf, it was run\nmultiple times so that it\u0027s 99% confidence results are within 1%.  The\npatches were tested on x86, x86-64 and ppc64 as\n\nx86:\tIntel Pentium D 3GHz with 8G RAM (no-brand machine)\n\tkernbench\t- No significant difference, variance well within noise\n\tnetperf-udp\t- 1.34% to 2.28% gain\n\tnetperf-tcp\t- 0.45% to 1.22% gain\n\thackbench\t- Small variances, very close to noise\n\tsysbench\t- Very small gains\n\nx86-64:\tAMD Phenom 9950 1.3GHz with 8G RAM (no-brand machine)\n\tkernbench\t- No significant difference, variance well within noise\n\tnetperf-udp\t- 1.83% to 10.42% gains\n\tnetperf-tcp\t- No conclusive until buffer \u003e\u003d PAGE_SIZE\n\t\t\t\t4096\t+15.83%\n\t\t\t\t8192\t+ 0.34% (not significant)\n\t\t\t\t16384\t+ 1%\n\thackbench\t- Small gains, very close to noise\n\tsysbench\t- 0.79% to 1.6% gain\n\nppc64:\tPPC970MP 2.5GHz with 10GB RAM (it\u0027s a terrasoft powerstation)\n\tkernbench\t- No significant difference, variance well within noise\n\tnetperf-udp\t- 2-3% gain for almost all buffer sizes tested\n\tnetperf-tcp\t- losses on small buffers, gains on larger buffers\n\t\t\t  possibly indicates some bad caching effect.\n\thackbench\t- No significant difference\n\tsysbench\t- 2-4% gain\n\nThis patch:\n\nCurrently the per-cpu page allocator searches the PCP list for pages of\nthe correct migrate-type to reduce the possibility of pages being\ninappropriate placed from a fragmentation perspective.  This search is\npotentially expensive in a fast-path and undesirable.  Splitting the\nper-cpu list into multiple lists increases the size of a per-cpu structure\nand this was potentially a major problem at the time the search was\nintroduced.  These problem has been mitigated as now only the necessary\nnumber of structures is allocated for the running system.\n\nThis patch replaces a list search in the per-cpu allocator with one list\nper migrate type.  The potential snag with this approach is when bulk\nfreeing pages.  We round-robin free pages based on migrate type which has\nlittle bearing on the cache hotness of the page and potentially checks\nempty lists repeatedly in the event the majority of PCP pages are of one\ntype.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Nick Piggin \u003cnpiggin@suse.de\u003e\nCc: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Pekka Enberg \u003cpenberg@cs.helsinki.fi\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f86296317434b21585e229f6c49a33cb9ebab4d3",
      "tree": "d4fb05d4aee1a8e373ec053e7316dc9847b2c417",
      "parents": [
        "1a8670a29b5277cbe601f74ab63d2c5211fb3005"
      ],
      "author": {
        "name": "Wu Fengguang",
        "email": "fengguang.wu@intel.com",
        "time": "Mon Sep 21 17:03:11 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Sep 22 07:17:39 2009 -0700"
      },
      "message": "mm: do batched scans for mem_cgroup\n\nFor mem_cgroup, shrink_zone() may call shrink_list() with nr_to_scan\u003d1, in\nwhich case shrink_list() _still_ calls isolate_pages() with the much\nlarger SWAP_CLUSTER_MAX.  It effectively scales up the inactive list scan\nrate by up to 32 times.\n\nFor example, with 16k inactive pages and DEF_PRIORITY\u003d12, (16k \u003e\u003e 12)\u003d4.\nSo when shrink_zone() expects to scan 4 pages in the active/inactive list,\nthe active list will be scanned 4 pages, while the inactive list will be\n(over) scanned SWAP_CLUSTER_MAX\u003d32 pages in effect.  And that could break\nthe balance between the two lists.\n\nIt can further impact the scan of anon active list, due to the anon\nactive/inactive ratio rebalance logic in balance_pgdat()/shrink_zone():\n\ninactive anon list over scanned \u003d\u003e inactive_anon_is_low() \u003d\u003d TRUE\n                                \u003d\u003e shrink_active_list()\n                                \u003d\u003e active anon list over scanned\n\nSo the end result may be\n\n- anon inactive  \u003d\u003e over scanned\n- anon active    \u003d\u003e over scanned (maybe not as much)\n- file inactive  \u003d\u003e over scanned\n- file active    \u003d\u003e under scanned (relatively)\n\nThe accesses to nr_saved_scan are not lock protected and so not 100%\naccurate, however we can tolerate small errors and the resulted small\nimbalanced scan rates between zones.\n\nCc: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by: Balbir Singh \u003cbalbir@linux.vnet.ibm.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a731286de62294b63d8ceb3c5914ac52cc17e690",
      "tree": "c321e14500ec264e37fd103ffa71c7b133088010",
      "parents": [
        "b35ea17b7bbf5dea35faa0de11030acc620c3197"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Mon Sep 21 17:01:37 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Sep 22 07:17:29 2009 -0700"
      },
      "message": "mm: vmstat: add isolate pages\n\nIf the system is running a heavy load of processes then concurrent reclaim\ncan isolate a large number of pages from the LRU. /proc/vmstat and the\noutput generated for an OOM do not show how many pages were isolated.\n\nThis has been observed during process fork bomb testing (mstctl11 in LTP).\n\nThis patch shows the information about isolated pages.\n\nReproduced via:\n\n-----------------------\n% ./hackbench 140 process 1000\n   \u003d\u003e OOM occur\n\nactive_anon:146 inactive_anon:0 isolated_anon:49245\n active_file:79 inactive_file:18 isolated_file:113\n unevictable:0 dirty:0 writeback:0 unstable:0 buffer:39\n free:370 slab_reclaimable:309 slab_unreclaimable:5492\n mapped:53 shmem:15 pagetables:28140 bounce:0\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Hugh Dickins \u003chugh.dickins@tiscali.co.uk\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4b02108ac1b3354a22b0d83c684797692efdc395",
      "tree": "9f65d6e8e35ddce940e7b9da6305cf5a19e5904e",
      "parents": [
        "c6a7f5728a1db45d30df55a01adc130b4ab0327c"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Mon Sep 21 17:01:33 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Sep 22 07:17:27 2009 -0700"
      },
      "message": "mm: oom analysis: add shmem vmstat\n\nRecently we encountered OOM problems due to memory use of the GEM cache.\nGenerally a large amuont of Shmem/Tmpfs pages tend to create a memory\nshortage problem.\n\nWe often use the following calculation to determine the amount of shmem\npages:\n\nshmem \u003d NR_ACTIVE_ANON + NR_INACTIVE_ANON - NR_ANON_PAGES\n\nhowever the expression does not consider isolated and mlocked pages.\n\nThis patch adds explicit accounting for pages used by shmem and tmpfs.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nAcked-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Hugh Dickins \u003chugh.dickins@tiscali.co.uk\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "c6a7f5728a1db45d30df55a01adc130b4ab0327c",
      "tree": "36649bc6ebb959841a5097c699968722cfd99c4d",
      "parents": [
        "71de1ccbe1fb40203edd3beb473f8580d917d2ca"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Mon Sep 21 17:01:32 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Sep 22 07:17:27 2009 -0700"
      },
      "message": "mm: oom analysis: Show kernel stack usage in /proc/meminfo and OOM log output\n\nThe amount of memory allocated to kernel stacks can become significant and\ncause OOM conditions.  However, we do not display the amount of memory\nconsumed by stacks.\n\nAdd code to display the amount of memory used for stacks in /proc/meminfo.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "6837765963f1723e80ca97b1fae660f3a60d77df",
      "tree": "a9a6ed4b7e3bf188966da78b04bf39298f24375a",
      "parents": [
        "bce7394a3ef82b8477952fbab838e4a6e8cb47d2"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Jun 16 15:32:51 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jun 16 19:47:42 2009 -0700"
      },
      "message": "mm: remove CONFIG_UNEVICTABLE_LRU config option\n\nCurrently, nobody wants to turn UNEVICTABLE_LRU off.  Thus this\nconfigurability is unnecessary.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Andi Kleen \u003candi@firstfloor.org\u003e\nAcked-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: David Woodhouse \u003cdwmw2@infradead.org\u003e\nCc: Matt Mackall \u003cmpm@selenic.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "6e08a369ee10b361ac1cdcdf4fabd420fd08beb3",
      "tree": "9dbf870cad025b64781d9051b6680a8a23927e5a",
      "parents": [
        "56e49d218890f49b0057710a4b6fef31f5ffbfec"
      ],
      "author": {
        "name": "Wu Fengguang",
        "email": "fengguang.wu@intel.com",
        "time": "Tue Jun 16 15:32:29 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jun 16 19:47:39 2009 -0700"
      },
      "message": "vmscan: cleanup the scan batching code\n\nThe vmscan batching logic is twisting.  Move it into a standalone function\nnr_scan_try_batch() and document it.  No behavior change.\n\nSigned-off-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Nick Piggin \u003cnpiggin@suse.de\u003e\nCc: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nAcked-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nAcked-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "418589663d6011de9006425b6c5721e1544fb47a",
      "tree": "ef37fb026d3e38191d6b5c99bc95c190fa98d0fb",
      "parents": [
        "a3af9c389a7f3e675313f442fdd8c247c1cdb66b"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Jun 16 15:32:12 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jun 16 19:47:35 2009 -0700"
      },
      "message": "page allocator: use allocation flags as an index to the zone watermark\n\nALLOC_WMARK_MIN, ALLOC_WMARK_LOW and ALLOC_WMARK_HIGH determin whether\npages_min, pages_low or pages_high is used as the zone watermark when\nallocating the pages.  Two branches in the allocator hotpath determine\nwhich watermark to use.\n\nThis patch uses the flags as an array index into a watermark array that is\nindexed with WMARK_* defines accessed via helpers.  All call sites that\nuse zone-\u003epages_* are updated to use the helpers for accessing the values\nand the array offsets for setting.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Pekka Enberg \u003cpenberg@cs.helsinki.fi\u003e\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "49255c619fbd482d704289b5eb2795f8e3b7ff2e",
      "tree": "b1f36ca46bda7767fce12bc4a70360a68f7255ab",
      "parents": [
        "11e33f6a55ed7847d9c8ffe185ef87faf7806abe"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Jun 16 15:31:58 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jun 16 19:47:33 2009 -0700"
      },
      "message": "page allocator: move check for disabled anti-fragmentation out of fastpath\n\nOn low-memory systems, anti-fragmentation gets disabled as there is\nnothing it can do and it would just incur overhead shuffling pages between\nlists constantly.  Currently the check is made in the free page fast path\nfor every page.  This patch moves it to a slow path.  On machines with low\nmemory, there will be small amount of additional overhead as pages get\nshuffled between lists but it should quickly settle.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Pekka Enberg \u003cpenberg@cs.helsinki.fi\u003e\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "eb33575cf67d3f35fa2510210ef92631266e2465",
      "tree": "55dd9958dd10758aa5b1ad0186a3356ae620da44",
      "parents": [
        "e1342f1da06d39b3bbd530e9306347c4438bc6e5"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Wed May 13 17:34:48 2009 +0100"
      },
      "committer": {
        "name": "Russell King",
        "email": "rmk+kernel@arm.linux.org.uk",
        "time": "Mon May 18 11:22:24 2009 +0100"
      },
      "message": "[ARM] Double check memmap is actually valid with a memmap has unexpected holes V2\n\npfn_valid() is meant to be able to tell if a given PFN has valid memmap\nassociated with it or not. In FLATMEM, it is expected that holes always\nhave valid memmap as long as there is valid PFNs either side of the hole.\nIn SPARSEMEM, it is assumed that a valid section has a memmap for the\nentire section.\n\nHowever, ARM and maybe other embedded architectures in the future free\nmemmap backing holes to save memory on the assumption the memmap is never\nused. The page_zone linkages are then broken even though pfn_valid()\nreturns true. A walker of the full memmap must then do this additional\ncheck to ensure the memmap they are looking at is sane by making sure the\nzone and PFN linkages are still valid. This is expensive, but walkers of\nthe full memmap are extremely rare.\n\nThis was caught before for FLATMEM and hacked around but it hits again for\nSPARSEMEM because the page_zone linkages can look ok where the PFN linkages\nare totally screwed. This looks like a hatchet job but the reality is that\nany clean solution would end up consumning all the memory saved by punching\nthese unexpected holes in the memmap. For example, we tried marking the\nmemmap within the section invalid but the section size exceeds the size of\nthe hole in most cases so pfn_valid() starts returning false where valid\nmemmap exists. Shrinking the size of the section would increase memory\nconsumption offsetting the gains.\n\nThis patch identifies when an architecture is punching unexpected holes\nin the memmap that the memory model cannot automatically detect and sets\nARCH_HAS_HOLES_MEMORYMODEL. At the moment, this is restricted to EP93xx\nwhich is the model sub-architecture this has been reported on but may expand\nlater. When set, walkers of the full memmap must call memmap_valid_within()\nfor each PFN and passing in what it expects the page and zone to be for\nthat PFN. If it finds the linkages to be broken, it assumes the memmap is\ninvalid for that PFN.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Russell King \u003crmk+kernel@arm.linux.org.uk\u003e\n"
    },
    {
      "commit": "90975ef71246c5c688ead04e8ff6f36dc92d28b3",
      "tree": "eda44b2efe91509719b0e62219c2efec13a9e762",
      "parents": [
        "cab4e4c43f92582a2bfc026137b3d8a175bd0360",
        "558f6ab9106e6be701acb0257e7171df1bbccf04"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Apr 05 10:33:07 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Apr 05 10:33:07 2009 -0700"
      },
      "message": "Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask\n\n* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits)\n  cpumask: remove cpumask allocation from idle_balance, fix\n  numa, cpumask: move numa_node_id default implementation to topology.h, fix\n  cpumask: remove cpumask allocation from idle_balance\n  x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus\n  x86: cpumask: update 32-bit APM not to mug current-\u003ecpus_allowed\n  x86: microcode: cleanup\n  x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c\n  cpumask: fix CONFIG_CPUMASK_OFFSTACK\u003dy cpu hotunplug crash\n  numa, cpumask: move numa_node_id default implementation to topology.h\n  cpumask: convert node_to_cpumask_map[] to cpumask_var_t\n  cpumask: remove x86 cpumask_t uses.\n  cpumask: use cpumask_var_t in uv_flush_tlb_others.\n  cpumask: remove cpumask_t assignment from vector_allocation_domain()\n  cpumask: make Xen use the new operators.\n  cpumask: clean up summit\u0027s send_IPI functions\n  cpumask: use new cpumask functions throughout x86\n  x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask\n  cpumask: convert struct cpuinfo_x86\u0027s llc_shared_map to cpumask_var_t\n  cpumask: convert node_to_cpumask_map[] to cpumask_var_t\n  x86: unify 32 and 64-bit node_to_cpumask_map\n  ...\n"
    },
    {
      "commit": "ee99c71c59f897436ec65debb99372b3146f9985",
      "tree": "051f1c43b7c7658689d4b2c23b3d8585d6464a89",
      "parents": [
        "a6dc60f8975ad96d162915e07703a4439c80dcf0"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Mar 31 15:19:31 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Apr 01 08:59:11 2009 -0700"
      },
      "message": "mm: introduce for_each_populated_zone() macro\n\nImpact: cleanup\n\nIn almost cases, for_each_zone() is used with populated_zone().  It\u0027s\nbecause almost function doesn\u0027t need memoryless node information.\nTherefore, for_each_populated_zone() can help to make code simplify.\n\nThis patch has no functional change.\n\n[akpm@linux-foundation.org: small cleanup]\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "082edb7bf443eb8eda15b482d16ad9dd8137ad24",
      "tree": "167d8c2ca193af9161aded5f368f300981c59535",
      "parents": [
        "0b966252d9e5d95ec2d11e63d7e55b42913aa5b7"
      ],
      "author": {
        "name": "Rusty Russell",
        "email": "rusty@rustcorp.com.au",
        "time": "Fri Mar 13 23:43:37 2009 +1030"
      },
      "committer": {
        "name": "Ingo Molnar",
        "email": "mingo@elte.hu",
        "time": "Fri Mar 13 14:35:31 2009 +0100"
      },
      "message": "numa, cpumask: move numa_node_id default implementation to topology.h\n\nImpact: cleanup, potential bugfix\n\nNot sure what changed to expose this, but clearly that numa_node_id()\ndoesn\u0027t belong in mmzone.h (the inline in gfp.h is probably overkill, too).\n\nIn file included from include/linux/topology.h:34,\n                 from arch/x86/mm/numa.c:2:\n/home/rusty/patches-cpumask/linux-2.6/arch/x86/include/asm/topology.h:64:1: warning: \"numa_node_id\" redefined\nIn file included from include/linux/topology.h:32,\n                 from arch/x86/mm/numa.c:2:\ninclude/linux/mmzone.h:770:1: warning: this is the location of the previous definition\n\nSigned-off-by: Rusty Russell \u003crusty@rustcorp.com.au\u003e\nCc: Mike Travis \u003ctravis@sgi.com\u003e\nLKML-Reference: \u003c200903132343.37661.rusty@rustcorp.com.au\u003e\nSigned-off-by: Ingo Molnar \u003cmingo@elte.hu\u003e\n"
    },
    {
      "commit": "cc2559bccc72767cb446f79b071d96c30c26439b",
      "tree": "aacdeee5368e0eef72ed1d7a7cbd7e6ee4837941",
      "parents": [
        "f2dbcfa738368c8a40d4a5f0b65dc9879577cb21"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Wed Feb 18 14:48:33 2009 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Feb 18 15:37:55 2009 -0800"
      },
      "message": "mm: fix memmap init for handling memory hole\n\nNow, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.\nand memmap initialization was not done. This was a trouble for\nsparc boot.\n\nTo fix this, the PFN should be initialized and marked as PG_reserved.\nThis patch changes early_pfn_in_nid() return true if PFN is a hole.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReported-by: David Miller \u003cdavem@davemlloft.net\u003e\nTested-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Heiko Carstens \u003cheiko.carstens@de.ibm.com\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "6e9015716ae9b59e9635d692fddfcfb9582c146c",
      "tree": "e1876d3822c46a20e1c35b41580f5ef6b2f6e053",
      "parents": [
        "f89eb90e33fd4e4e0cc1a6d20afd63c5a561885a"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Wed Jan 07 18:08:15 2009 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 08 08:31:07 2009 -0800"
      },
      "message": "mm: introduce zone_reclaim struct\n\nAdd zone_reclam_stat struct for later enhancement.\n\nA later patch uses this.  This patch doesn\u0027t any behavior change (yet).\n\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "52d4b9ac0b985168009c2a57098324e67bae171f",
      "tree": "b3e3b854166930af893be90ea30a7ab0d65c59e7",
      "parents": [
        "c05555b572921c464d064d9267f7f7bc06d424fa"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Sat Oct 18 20:28:16 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 20 08:52:39 2008 -0700"
      },
      "message": "memcg: allocate all page_cgroup at boot\n\nAllocate all page_cgroup at boot and remove page_cgroup poitner from\nstruct page.  This patch adds an interface as\n\n struct page_cgroup *lookup_page_cgroup(struct page*)\n\nAll FLATMEM/DISCONTIGMEM/SPARSEMEM  and MEMORY_HOTPLUG is supported.\n\nRemove page_cgroup pointer reduces the amount of memory by\n - 4 bytes per PAGE_SIZE.\n - 8 bytes per PAGE_SIZE\nif memory controller is disabled. (even if configured.)\n\nOn usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.\nOn my x86-64 server with 48GB of memory, this saves 96MB of memory.\nI think this reduction makes sense.\n\nBy pre-allocation, kmalloc/kfree in charge/uncharge are removed.\nThis means\n  - we\u0027re not necessary to be afraid of kmalloc faiulre.\n    (this can happen because of gfp_mask type.)\n  - we can avoid calling kmalloc/kfree.\n  - we can avoid allocating tons of small objects which can be fragmented.\n  - we can know what amount of memory will be used for this extra-lru handling.\n\nI added printk message as\n\n\t\"allocated %ld bytes of page_cgroup\"\n        \"please try cgroup_disable\u003dmemory option if you don\u0027t want\"\n\nmaybe enough informative for users.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Balbir Singh \u003cbalbir@linux.vnet.ibm.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5344b7e648980cc2ca613ec03a56a8222ff48820",
      "tree": "f9f8773ae8e38fb91aec52ca9ad2bd81f039b565",
      "parents": [
        "ba470de43188cdbff795b5da43a1474523c6c2fb"
      ],
      "author": {
        "name": "Nick Piggin",
        "email": "npiggin@suse.de",
        "time": "Sat Oct 18 20:26:51 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 20 08:52:31 2008 -0700"
      },
      "message": "vmstat: mlocked pages statistics\n\nAdd NR_MLOCK zone page state, which provides a (conservative) count of\nmlocked pages (actually, the number of mlocked pages moved off the LRU).\n\nReworked by lts to fit in with the modified mlock page support in the\nReclaim Scalability series.\n\n[kosaki.motohiro@jp.fujitsu.com: fix incorrect Mlocked field of /proc/meminfo]\n[lee.schermerhorn@hp.com: mlocked-pages: add event counting with statistics]\nSigned-off-by: Nick Piggin \u003cnpiggin@suse.de\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "894bc310419ac95f4fa4142dc364401a7e607f65",
      "tree": "15d56a7333b41620016b845d2323dd06e822b621",
      "parents": [
        "8a7a8544a4f6554ec2d8048ac9f9672f442db5a2"
      ],
      "author": {
        "name": "Lee Schermerhorn",
        "email": "Lee.Schermerhorn@hp.com",
        "time": "Sat Oct 18 20:26:39 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 20 08:50:26 2008 -0700"
      },
      "message": "Unevictable LRU Infrastructure\n\nWhen the system contains lots of mlocked or otherwise unevictable pages,\nthe pageout code (kswapd) can spend lots of time scanning over these\npages.  Worse still, the presence of lots of unevictable pages can confuse\nkswapd into thinking that more aggressive pageout modes are required,\nresulting in all kinds of bad behaviour.\n\nInfrastructure to manage pages excluded from reclaim--i.e., hidden from\nvmscan.  Based on a patch by Larry Woodman of Red Hat.  Reworked to\nmaintain \"unevictable\" pages on a separate per-zone LRU list, to \"hide\"\nthem from vmscan.\n\nKosaki Motohiro added the support for the memory controller unevictable\nlru list.\n\nPages on the unevictable list have both PG_unevictable and PG_lru set.\nThus, PG_unevictable is analogous to and mutually exclusive with\nPG_active--it specifies which LRU list the page is on.\n\nThe unevictable infrastructure is enabled by a new mm Kconfig option\n[CONFIG_]UNEVICTABLE_LRU.\n\nA new function \u0027page_evictable(page, vma)\u0027 in vmscan.c tests whether or\nnot a page may be evictable.  Subsequent patches will add the various\n!evictable tests.  We\u0027ll want to keep these tests light-weight for use in\nshrink_active_list() and, possibly, the fault path.\n\nTo avoid races between tasks putting pages [back] onto an LRU list and\ntasks that might be moving the page from non-evictable to evictable state,\nthe new function \u0027putback_lru_page()\u0027 -- inverse to \u0027isolate_lru_page()\u0027\n-- tests the \"evictability\" of a page after placing it on the LRU, before\ndropping the reference.  If the page has become unevictable,\nputback_lru_page() will redo the \u0027putback\u0027, thus moving the page to the\nunevictable list.  This way, we avoid \"stranding\" evictable pages on the\nunevictable list.\n\n[akpm@linux-foundation.org: fix fallout from out-of-order merge]\n[riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build]\n[nishimura@mxp.nes.nec.co.jp: remove redundant mapping check]\n[kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework]\n[kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c]\n[kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure]\n[kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch]\n[kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch]\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nDebugged-by: Benjamin Kidwell \u003cbenjkidwell@yahoo.com\u003e\nSigned-off-by: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "556adecba110bf5f1db6c6b56416cfab5bcab698",
      "tree": "a721d84d28c4d99a54632b472b452ea3d4b2b137",
      "parents": [
        "4f98a2fee8acdb4ac84545df98cccecfd130f8db"
      ],
      "author": {
        "name": "Rik van Riel",
        "email": "riel@redhat.com",
        "time": "Sat Oct 18 20:26:34 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 20 08:50:25 2008 -0700"
      },
      "message": "vmscan: second chance replacement for anonymous pages\n\nWe avoid evicting and scanning anonymous pages for the most part, but\nunder some workloads we can end up with most of memory filled with\nanonymous pages.  At that point, we suddenly need to clear the referenced\nbits on all of memory, which can take ages on very large memory systems.\n\nWe can reduce the maximum number of pages that need to be scanned by not\ntaking the referenced state into account when deactivating an anonymous\npage.  After all, every anonymous page starts out referenced, so why\ncheck?\n\nIf an anonymous page gets referenced again before it reaches the end of\nthe inactive list, we move it back to the active list.\n\nTo keep the maximum amount of necessary work reasonable, we scale the\nactive to inactive ratio with the size of memory, using the formula\nactive:inactive ratio \u003d sqrt(memory in GB * 10).\n\nKswapd CPU use now seems to scale by the amount of pageout bandwidth,\ninstead of by the amount of memory present in the system.\n\n[kamezawa.hiroyu@jp.fujitsu.com: fix OOM with memcg]\n[kamezawa.hiroyu@jp.fujitsu.com: memcg: lru scan fix]\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4f98a2fee8acdb4ac84545df98cccecfd130f8db",
      "tree": "035a2937f4c3e2f7b4269412041c073ac646937c",
      "parents": [
        "b2e185384f534781fd22f5ce170b2ad26f97df70"
      ],
      "author": {
        "name": "Rik van Riel",
        "email": "riel@redhat.com",
        "time": "Sat Oct 18 20:26:32 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 20 08:50:25 2008 -0700"
      },
      "message": "vmscan: split LRU lists into anon \u0026 file sets\n\nSplit the LRU lists in two, one set for pages that are backed by real file\nsystems (\"file\") and one for pages that are backed by memory and swap\n(\"anon\").  The latter includes tmpfs.\n\nThe advantage of doing this is that the VM will not have to scan over lots\nof anonymous pages (which we generally do not want to swap out), just to\nfind the page cache pages that it should evict.\n\nThis patch has the infrastructure and a basic policy to balance how much\nwe scan the anon lists and how much we scan the file lists.  The big\npolicy changes are in separate patches.\n\n[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]\n[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]\n[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn\u0027t treat unevictable page]\n[hugh@veritas.com: memcg swapbacked pages active]\n[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]\n[akpm@linux-foundation.org: fix /proc/vmstat units]\n[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]\n[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]\n[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Lee Schermerhorn \u003cLee.Schermerhorn@hp.com\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Hugh Dickins \u003chugh@veritas.com\u003e\nSigned-off-by: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b69408e88bd86b98feb7b9a38fd865e1ddb29827",
      "tree": "b19277c29fe624870ba776cc6ada59928cd2796d",
      "parents": [
        "62695a84eb8f2e718bf4dfb21700afaa7a08e0ea"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "cl@linux-foundation.org",
        "time": "Sat Oct 18 20:26:14 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 20 08:50:25 2008 -0700"
      },
      "message": "vmscan: Use an indexed array for LRU variables\n\nCurrently we are defining explicit variables for the inactive and active\nlist.  An indexed array can be more generic and avoid repeating similar\ncode in several places in the reclaim code.\n\nWe are saving a few bytes in terms of code size:\n\nBefore:\n\n   text    data     bss     dec     hex filename\n4097753  573120 4092484 8763357  85b7dd vmlinux\n\nAfter:\n\n   text    data     bss     dec     hex filename\n4097729  573120 4092484 8763333  85b7c5 vmlinux\n\nHaving an easy way to add new lru lists may ease future work on the\nreclaim code.\n\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5bead2a0680687b9576d57c177988e8aa082b922",
      "tree": "25d8db69bd7b353131f9a5260d024d3018eeffa0",
      "parents": [
        "7e96445533ac3f4f7964646a202ff3620602fab4"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Sat Sep 13 02:33:19 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sat Sep 13 14:41:52 2008 -0700"
      },
      "message": "mm: mark the correct zone as full when scanning zonelists\n\nThe iterator for_each_zone_zonelist() uses a struct zoneref *z cursor when\nscanning zonelists to keep track of where in the zonelist it is.  The\nzoneref that is returned corresponds to the the next zone that is to be\nscanned, not the current one.  It was intended to be treated as an opaque\nlist.\n\nWhen the page allocator is scanning a zonelist, it marks elements in the\nzonelist corresponding to zones that are temporarily full.  As the\nzonelist is being updated, it uses the cursor here;\n\n  if (NUMA_BUILD)\n        zlc_mark_zone_full(zonelist, z);\n\nThis is intended to prevent rescanning in the near future but the zoneref\ncursor does not correspond to the zone that has been found to be full.\nThis is an easy misunderstanding to make so this patch corrects the\nproblem by changing zoneref cursor to be the current zone being scanned\ninstead of the next one.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.26.x]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "12d15f0d51d47cec39d1d7250e81573c5cbd8b5d",
      "tree": "5bad21a83e8746febbc27f9e403a8fe0a1f3ef69",
      "parents": [
        "fb56f0f9922d3fb2c5503cdc346dc3f86c897bc4"
      ],
      "author": {
        "name": "Fernando Luis Vazquez Cao",
        "email": "fernando@oss.ntt.co.jp",
        "time": "Fri May 23 13:05:01 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sat May 24 09:56:13 2008 -0700"
      },
      "message": "for_each_online_pgdat(): kerneldoc fix\n\nfor_each_pgdat() was renamed to for_each_online_pgdat() and kerneldoc\ncomments should be updated accordingly.\n\nSigned-off-by: Fernando Luis Vazquez Cao \u003cfernando@oss.ntt.co.jp\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "735643ee6cc5249bfac07fcad0946a5e7aff4423",
      "tree": "e725df246f4a3cf88b6b42a28d859ab969acf81c",
      "parents": [
        "71cc2c2152170b8166f59abb0604dc62073aeb92"
      ],
      "author": {
        "name": "Robert P. J. Day",
        "email": "rpjday@crashcourse.ca",
        "time": "Wed Apr 30 00:55:12 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Apr 30 08:29:54 2008 -0700"
      },
      "message": "Remove \"#ifdef __KERNEL__\" checks from unexported headers\n\nRemove the \"#ifdef __KERNEL__\" tests from unexported header files in\nlinux/include whose entire contents are wrapped in that preprocessor\ntest.\n\nSigned-off-by: Robert P. J. Day \u003crpjday@crashcourse.ca\u003e\nCc: David Woodhouse \u003cdwmw2@infradead.org\u003e\nCc: Sam Ravnborg \u003csam@ravnborg.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "fc3ba692a4d19019387c5acaea63131f9eab05dd",
      "tree": "c86e025cb8f79c7ffc479029989b7378bcb9f285",
      "parents": [
        "dd5656e59ca7b25fb60a22f9079905ed0da5ed0c"
      ],
      "author": {
        "name": "Miklos Szeredi",
        "email": "mszeredi@suse.cz",
        "time": "Wed Apr 30 00:54:38 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Apr 30 08:29:50 2008 -0700"
      },
      "message": "mm: Add NR_WRITEBACK_TEMP counter\n\nFuse will use temporary buffers to write back dirty data from memory mappings\n(normal writes are done synchronously).  This is needed, because there cannot\nbe any guarantee about the time in which a write will complete.\n\nBy using temporary buffers, from the MM\u0027s point if view the page is written\nback immediately.  If the writeout was due to memory pressure, this\neffectively migrates data from a full zone to a less full zone.\n\nThis patch adds a new counter (NR_WRITEBACK_TEMP) for the number of pages used\nas temporary buffers.\n\n[Lee.Schermerhorn@hp.com: add vmstat_text for NR_WRITEBACK_TEMP]\nSigned-off-by: Miklos Szeredi \u003cmszeredi@suse.cz\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "04753278769f3b6c3b79a080edb52f21d83bf6e2",
      "tree": "0dff4088b44016b6d04930b2fc09419412821aa2",
      "parents": [
        "7f2e9525ba55b1c42ad6c4a5a59d7eb7bdd9be72"
      ],
      "author": {
        "name": "Yasunori Goto",
        "email": "y-goto@jp.fujitsu.com",
        "time": "Mon Apr 28 02:13:31 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:25 2008 -0700"
      },
      "message": "memory hotplug: register section/node id to free\n\nThis patch set is to free pages which is allocated by bootmem for\nmemory-hotremove.  Some structures of memory management are allocated by\nbootmem.  ex) memmap, etc.\n\nTo remove memory physically, some of them must be freed according to\ncircumstance.  This patch set makes basis to free those pages, and free\nmemmaps.\n\nBasic my idea is using remain members of struct page to remember information\nof users of bootmem (section number or node id).  When the section is\nremoving, kernel can confirm it.  By this information, some issues can be\nsolved.\n\n  1) When the memmap of removing section is allocated on other\n     section by bootmem, it should/can be free.\n  2) When the memmap of removing section is allocated on the\n     same section, it shouldn\u0027t be freed. Because the section has to be\n     logical memory offlined already and all pages must be isolated against\n     page allocater. If it is freed, page allocator may use it which will\n     be removed physically soon.\n  3) When removing section has other section\u0027s memmap,\n     kernel will be able to show easily which section should be removed\n     before it for user. (Not implemented yet)\n  4) When the above case 2), the page isolation will be able to check and skip\n     memmap\u0027s page when logical memory offline (offline_pages()).\n     Current page isolation code fails in this case because this page is\n     just reserved page and it can\u0027t distinguish this pages can be\n     removed or not. But, it will be able to do by this patch.\n     (Not implemented yet.)\n  5) The node information like pgdat has similar issues. But, this\n     will be able to be solved too by this.\n     (Not implemented yet, but, remembering node id in the pages.)\n\nFortunately, current bootmem allocator just keeps PageReserved flags,\nand doesn\u0027t use any other members of page struct. The users of\nbootmem doesn\u0027t use them too.\n\nThis patch:\n\nThis is to register information which is node or section\u0027s id.  Kernel can\ndistinguish which node/section uses the pages allcated by bootmem.  This is\nbasis for hot-remove sections or nodes.\n\nSigned-off-by: Yasunori Goto \u003cy-goto@jp.fujitsu.com\u003e\nCc: Badari Pulavarty \u003cpbadari@us.ibm.com\u003e\nCc: Yinghai Lu \u003cyhlu.kernel@gmail.com\u003e\nCc: Yasunori Goto \u003cy-goto@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "97965478a66fbdf0f4ad5e4ecc4828f0cb548a45",
      "tree": "a60bb6c46acdc35d16b2e48f5c13248fc009b35e",
      "parents": [
        "ec7cade8c1a3d1ace69b35cc843b181818578dce"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Apr 28 02:12:54 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:22 2008 -0700"
      },
      "message": "mm: Get rid of __ZONE_COUNT\n\nIt was used to compensate because MAX_NR_ZONES was not available to the\n#ifdefs.  Export MAX_NR_ZONES via the new mechanism and get rid of\n__ZONE_COUNT.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "9223b4190fa1297a59f292f3419fc0285321d0ea",
      "tree": "c6fbbc6b4c35916232e95686194eea1bd9de7377",
      "parents": [
        "e26831814998cee8e6d9f0a9854cb46c516f5547"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Apr 28 02:12:48 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:21 2008 -0700"
      },
      "message": "pageflags: get rid of FLAGS_RESERVED\n\nNR_PAGEFLAGS specifies the number of page flags we are using.  From that we\ncan calculate the number of bits leftover that can be used for zone, node (and\nmaybe the sections id).  There is no need anymore for FLAGS_RESERVED if we use\nNR_PAGEFLAGS.\n\nUse the new methods to make NR_PAGEFLAGS available via the preprocessor.\nNR_PAGEFLAGS is used to calculate field boundaries in the page flags fields.\nThese field widths have to be available to the preprocessor.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: David Miller \u003cdavem@davemloft.net\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Jeremy Fitzhardinge \u003cjeremy@goop.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b45445684198a946b587732265692e6495993abf",
      "tree": "0ffe64205f396bb00f6dfd3cbee630a1d0a975e7",
      "parents": [
        "ac6aadb24b7d4f0e54246732e221c102073412bf"
      ],
      "author": {
        "name": "Andrew Morton",
        "email": "akpm@linux-foundation.org",
        "time": "Mon Apr 28 02:12:39 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:20 2008 -0700"
      },
      "message": "mm: make early_pfn_to_nid() a C function\n\nFix this (sparc64)\n\nmm/sparse-vmemmap.c: In function `vmemmap_verify\u0027:\nmm/sparse-vmemmap.c:64: warning: unused variable `pfn\u0027\n\nby switching to a C function which touches its arg.\n\n(reason 3,555 why macros are bad)\n\nAlso, the `nid\u0027 arg was misnamed.\n\nReviewed-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "19770b32609b6bf97a3dece2529089494cbfc549",
      "tree": "3b5922d1b20aabdf929bde9309f323841717747a",
      "parents": [
        "dd1a239f6f2d4d3eedd318583ec319aa145b324c"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Apr 28 02:12:18 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:19 2008 -0700"
      },
      "message": "mm: filter based on a nodemask as well as a gfp_mask\n\nThe MPOL_BIND policy creates a zonelist that is used for allocations\ncontrolled by that mempolicy.  As the per-node zonelist is already being\nfiltered based on a zone id, this patch adds a version of __alloc_pages() that\ntakes a nodemask for further filtering.  This eliminates the need for\nMPOL_BIND to create a custom zonelist.\n\nA positive benefit of this is that allocations using MPOL_BIND now use the\nlocal node\u0027s distance-ordered zonelist instead of a custom node-id-ordered\nzonelist.  I.e., pages will be allocated from the closest allowed node with\navailable memory.\n\n[Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]\n[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]\n[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "dd1a239f6f2d4d3eedd318583ec319aa145b324c",
      "tree": "aff4224c96b5e2e67588c3946858a724863eeaf9",
      "parents": [
        "54a6eb5c4765aa573a030ceeba2c14e3d2ea5706"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Apr 28 02:12:17 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:18 2008 -0700"
      },
      "message": "mm: have zonelist contains structs with both a zone pointer and zone_idx\n\nFiltering zonelists requires very frequent use of zone_idx().  This is costly\nas it involves a lookup of another structure and a substraction operation.  As\nthe zone_idx is often required, it should be quickly accessible.  The node idx\ncould also be stored here if it was found that accessing zone-\u003enode is\nsignificant which may be the case on workloads where nodemasks are heavily\nused.\n\nThis patch introduces a struct zoneref to store a zone pointer and a zone\nindex.  The zonelist then consists of an array of these struct zonerefs which\nare looked up as necessary.  Helpers are given for accessing the zone index as\nwell as the node index.\n\n[kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]\n[hugh@veritas.com: mm-have-zonelist: fix memcg ooms]\n[hugh@veritas.com: just return do_try_to_free_pages]\n[hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Hugh Dickins \u003chugh@veritas.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "54a6eb5c4765aa573a030ceeba2c14e3d2ea5706",
      "tree": "547176a090beb787722a153cf2b8b942dc0e68db",
      "parents": [
        "18ea7e710d2452fa726814a406779188028cf1bf"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Apr 28 02:12:16 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:18 2008 -0700"
      },
      "message": "mm: use two zonelist that are filtered by GFP mask\n\nCurrently a node has two sets of zonelists, one for each zone type in the\nsystem and a second set for GFP_THISNODE allocations.  Based on the zones\nallowed by a gfp mask, one of these zonelists is selected.  All of these\nzonelists consume memory and occupy cache lines.\n\nThis patch replaces the multiple zonelists per-node with two zonelists.  The\nfirst contains all populated zones in the system, ordered by distance, for\nfallback allocations when the target/preferred node has no free pages.  The\nsecond contains all populated zones in the node suitable for GFP_THISNODE\nallocations.\n\nAn iterator macro is introduced called for_each_zone_zonelist() that interates\nthrough each zone allowed by the GFP flags in the selected zonelist.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ddc81ed2c5d47a078a3b02c5c3a4345bc2bc3c9b",
      "tree": "2bd3e56604350d05d5163e32dde7fbbe56d31586",
      "parents": [
        "488514d1798289f56f80ed018e246179fe500383"
      ],
      "author": {
        "name": "Harvey Harrison",
        "email": "harvey.harrison@gmail.com",
        "time": "Mon Apr 28 02:12:07 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Apr 28 08:58:17 2008 -0700"
      },
      "message": "remove sparse warning for mmzone.h\n\ninclude/linux/mmzone.h:640:22: warning: potentially expensive pointer subtraction\n\nCalculate the offset into the node_zones array rather than the index\nusing casts to (char *) and comparing against the index * sizeof(struct zone).\n\nOn X86_32 this saves a sar, but code size increases by one byte per\nis_highmem() use due to 32-bit cmps rather than 16 bit cmps.\n\nBefore:\n 207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax\n 20d:   c1 f8 0b                sar    $0xb,%eax\n 210:   83 f8 02                cmp    $0x2,%eax\n 213:   74 16                   je     22b \u003ckmap_atomic_prot+0x144\u003e\n 215:   83 f8 03                cmp    $0x3,%eax\n 218:   0f 85 8f 00 00 00       jne    2ad \u003ckmap_atomic_prot+0x1c6\u003e\n 21e:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0\n 225:   0f 85 82 00 00 00       jne    2ad \u003ckmap_atomic_prot+0x1c6\u003e\n 22b:   64 a1 00 00 00 00       mov    %fs:0x0,%eax\n\nAfter:\n 207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax\n 20d:   3d 00 10 00 00          cmp    $0x1000,%eax\n 212:   74 18                   je     22c \u003ckmap_atomic_prot+0x145\u003e\n 214:   3d 00 18 00 00          cmp    $0x1800,%eax\n 219:   0f 85 8f 00 00 00       jne    2ae \u003ckmap_atomic_prot+0x1c7\u003e\n 21f:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0\n 226:   0f 85 82 00 00 00       jne    2ae \u003ckmap_atomic_prot+0x1c7\u003e\n 22c:   64 a1 00 00 00 00       mov    %fs:0x0,%eax\n\n[akpm@linux-foundation.org: coding-style fixes]\nSigned-off-by: Harvey Harrison \u003charvey.harrison@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "218ff137bc67252694420563d23d051ab9227f17",
      "tree": "c1005fb94ea7ef2c9ef418b6fa3d0d8f159d45c2",
      "parents": [
        "c0ff1f26acdbb8dc67165b0bbf910f795f0a0ca3"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "hannes@saeurebad.de",
        "time": "Mon Apr 21 22:35:29 2008 +0000"
      },
      "committer": {
        "name": "Jesper Juhl",
        "email": "juhl@hera.kernel.org",
        "time": "Mon Apr 21 22:35:29 2008 +0000"
      },
      "message": "Remove unused MAX_NODES_SHIFT\n\nMAX_NODES_SHIFT is not referenced anywhere in the tree, so dump it.\n\nSigned-off-by: Johannes Weiner \u003channes@saeurebad.de\u003e\nSigned-off-by: Jesper Juhl \u003cjesper.juhl@gmail.com\u003e\n"
    },
    {
      "commit": "3dfa5721f12c3d5a441448086bee156887daa961",
      "tree": "8ace8c3f842f8b626b762bb9d2a9b24d8e3bd130",
      "parents": [
        "5dc331852848a38ca00a2817e5b98a1d0561b116"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Mon Feb 04 22:29:19 2008 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Feb 05 09:44:18 2008 -0800"
      },
      "message": "Page allocator: get rid of the list of cold pages\n\nWe have repeatedly discussed if the cold pages still have a point. There is\none way to join the two lists: Use a single list and put the cold pages at the\nend and the hot pages at the beginning. That way a single list can serve for\nboth types of allocations.\n\nThe discussion of the RFC for this and Mel\u0027s measurements indicate that\nthere may not be too much of a point left to having separate lists for\nhot and cold pages (see http://marc.info/?t\u003d119492914200001\u0026r\u003d1\u0026w\u003d2).\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Martin Bligh \u003cmbligh@mbligh.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d773ed6b856a96bd6d18b6e04455e3ced0876da4",
      "tree": "f0235be6843ec323aeedcdadbee34a777b6c2690",
      "parents": [
        "ae74138da609c576b221c765efa8b81b2365f465"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Tue Oct 16 23:26:01 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed Oct 17 08:42:46 2007 -0700"
      },
      "message": "mm: test and set zone reclaim lock before starting reclaim\n\nIntroduces new zone flag interface for testing and setting flags:\n\n\tint zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)\n\nInstead of setting and clearing ZONE_RECLAIM_LOCKED each time shrink_zone() is\ncalled, this flag is test and set before starting zone reclaim.  Zone reclaim\nstarts in __alloc_pages() when a zone\u0027s watermark fails and the system is in\nzone_reclaim_mode.  If it\u0027s already in reclaim, there\u0027s no need to start again\nso it is simply considered full for that allocation attempt.\n\nThere is a change of behavior with regard to concurrent zone shrinking.  It is\nnow possible for try_to_free_pages() or kswapd to already be shrinking a\nparticular zone when __alloc_pages() starts zone reclaim.  In this case, it is\npossible for two concurrent threads to invoke shrink_zone() for a single zone.\n\nThis change forbids a zone to be in zone reclaim twice, which was always the\nbehavior, but allows for concurrent try_to_free_pages() or kswapd shrinking\nwhen starting zone reclaim.\n\nCc: Andrea Arcangeli \u003candrea@suse.de\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "098d7f128a4e53cb64930628915ac767785e0e60",
      "tree": "ed3cab1daecab7f2a64b27deed190df3ec218789",
      "parents": [
        "e815af95f94914993bbad279c71cf5fef9f4eaac"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Tue Oct 16 23:25:55 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed Oct 17 08:42:45 2007 -0700"
      },
      "message": "oom: add per-zone locking\n\nOOM killer synchronization should be done with zone granularity so that memory\npolicy and cpuset allocations may have their corresponding zones locked and\nallow parallel kills for other OOM conditions that may exist elsewhere in the\nsystem.  DMA allocations can be targeted at the zone level, which would not be\npossible if locking was done in nodes or globally.\n\nSynchronization shall be done with a variation of \"trylocks.\" The goal is to\nput the current task to sleep and restart the failed allocation attempt later\nif the trylock fails.  Otherwise, the OOM killer is invoked.\n\nEach zone in the zonelist that __alloc_pages() was called with is checked for\nthe newly-introduced ZONE_OOM_LOCKED flag.  If any zone has this flag present,\nthe \"trylock\" to serialize the OOM killer fails and returns zero.  Otherwise,\nall the zones have ZONE_OOM_LOCKED set and the try_set_zone_oom() function\nreturns non-zero.\n\nCc: Andrea Arcangeli \u003candrea@suse.de\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e815af95f94914993bbad279c71cf5fef9f4eaac",
      "tree": "492e0d3e8d3303f37cf9fb0beecf952a1c828c53",
      "parents": [
        "70e24bdf6d2fead14631e72a07fba012400c521e"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Tue Oct 16 23:25:54 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed Oct 17 08:42:45 2007 -0700"
      },
      "message": "oom: change all_unreclaimable zone member to flags\n\nConvert the int all_unreclaimable member of struct zone to unsigned long\nflags.  This can now be used to specify several different zone flags such as\nall_unreclaimable and reclaim_in_progress, which can now be removed and\nconverted to a per-zone flag.\n\nFlags are set and cleared as follows:\n\n\tzone_set_flag(struct zone *zone, zone_flags_t flag)\n\tzone_clear_flag(struct zone *zone, zone_flags_t flag)\n\nDefines the first zone flags, ZONE_ALL_UNRECLAIMABLE and ZONE_RECLAIM_LOCKED,\nwhich have the same semantics as the old zone-\u003eall_unreclaimable and\nzone-\u003ereclaim_in_progress, respectively.  Also converts all current users that\nset or clear either flag to use the new interface.\n\nHelper functions are defined to test the flags:\n\n\tint zone_is_all_unreclaimable(const struct zone *zone)\n\tint zone_is_reclaim_locked(const struct zone *zone)\n\nAll flag operators are of the atomic variety because there are currently\nreaders that are implemented that do not take zone-\u003elock.\n\n[akpm@linux-foundation.org: add needed include]\nCc: Andrea Arcangeli \u003candrea@suse.de\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a5d76b54a3f3a40385d7f76069a2feac9f1bad63",
      "tree": "f58c432a4224b3be032bd4a4afa79dfa55d198a6",
      "parents": [
        "75884fb1c6388f3713ddcca662f3647b3129aaeb"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Oct 16 01:26:11 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:02 2007 -0700"
      },
      "message": "memory unplug: page isolation\n\nImplement generic chunk-of-pages isolation method by using page grouping ops.\n\nThis patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this\n - MIGRATE_TYPES increases.\n - bitmap for migratetype is enlarged.\n\npages of MIGRATE_ISOLATE migratetype will not be allocated even if it is free.\nBy this, you can isolated *freed* pages from users. How-to-free pages is not\na purpose of this patch. You may use reclaim and migrate codes to free pages.\n\nIf start_isolate_page_range(start,end) is called,\n - migratetype of the range turns to be MIGRATE_ISOLATE  if\n   its type is MIGRATE_MOVABLE. (*) this check can be updated if other\n   memory reclaiming works make progress.\n - MIGRATE_ISOLATE is not on migratetype fallback list.\n - All free pages and will-be-freed pages are isolated.\nTo check all pages in the range are isolated or not,  use test_pages_isolated(),\nTo cancel isolation, use undo_isolate_page_range().\n\nChanges V6 -\u003e V7\n - removed unnecessary #ifdef\n\nThere are HOLES_IN_ZONE handling codes...I\u0027m glad if we can remove them..\n\nSigned-off-by: Yasunori Goto \u003cy-goto@jp.fujitsu.com\u003e\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "467c996c1e1910633fa8e7adc9b052aa3ed5f97c",
      "tree": "09e0e70160386be1bdaa12801afddf287e12c8a1",
      "parents": [
        "d9c2340052278d8eb2ffb16b0484f8f794def4de"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:26:02 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo\n\nThis patch provides fragmentation avoidance statistics via /proc/pagetypeinfo.\n The information is collected only on request so there is no runtime overhead.\n The statistics are in three parts:\n\nThe first part prints information on the size of blocks that pages are\nbeing grouped on and looks like\n\nPage block order: 10\nPages per block:  1024\n\nThe second part is a more detailed version of /proc/buddyinfo and looks like\n\nFree pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10\nNode    0, zone      DMA, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0\nNode    0, zone      DMA, type  Reclaimable      1      0      0      0      0      0      0      0      0      0      0\nNode    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      0\nNode    0, zone      DMA, type      Reserve      0      4      4      0      0      0      0      1      0      1      0\nNode    0, zone   Normal, type    Unmovable    111      8      4      4      2      3      1      0      0      0      0\nNode    0, zone   Normal, type  Reclaimable    293     89      8      0      0      0      0      0      0      0      0\nNode    0, zone   Normal, type      Movable      1      6     13      9      7      6      3      0      0      0      0\nNode    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      4\n\nThe third part looks like\n\nNumber of blocks type     Unmovable  Reclaimable      Movable      Reserve\nNode 0, zone      DMA            0            1            2            1\nNode 0, zone   Normal            3           17           94            4\n\nTo walk the zones within a node with interrupts disabled, walk_zones_in_node()\nis introduced and shared between /proc/buddyinfo, /proc/zoneinfo and\n/proc/pagetypeinfo to reduce code duplication.  It seems specific to what\nvmstat.c requires but could be broken out as a general utility function in\nmmzone.c if there were other other potential users.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d9c2340052278d8eb2ffb16b0484f8f794def4de",
      "tree": "aec7e4e11473a4fcdfd389c718544780a042c6df",
      "parents": [
        "d100313fd615cc30374ff92e0b3facb053838330"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:26:01 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "Do not depend on MAX_ORDER when grouping pages by mobility\n\nCurrently mobility grouping works at the MAX_ORDER_NR_PAGES level.  This makes\nsense for the majority of users where this is also the huge page size.\nHowever, on platforms like ia64 where the huge page size is runtime\nconfigurable it is desirable to group at a lower order.  On x86_64 and\noccasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES.\n\nThis patch groups pages together based on the value of HUGETLB_PAGE_ORDER.  It\nuses a compile-time constant if possible and a variable where the huge page\nsize is runtime configurable.\n\nIt is assumed that grouping should be done at the lowest sensible order and\nthat the user would not want to override this.  If this is not true,\npage_block order could be forced to a variable initialised via a boot-time\nkernel parameter.\n\nOne potential issue with this patch is that IA64 now parses hugepagesz with\nearly_param() instead of __setup().  __setup() is called after the memory\nallocator has been initialised and the pageblock bitmaps already setup.  In\ntests on one IA64 there did not seem to be any problem with using\nearly_param() and in fact may be more correct as it guarantees the parameter\nis handled before the parsing of hugepages\u003d.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "64c5e135bf5a2a7f0ededb3435a31adbe0202f0c",
      "tree": "cb4ff93cbcc3c27176723419a313d7c53545d36b",
      "parents": [
        "ac0e5b7a6b93fb291b01fe1e951e3c16bcdd3503"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:59 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "don\u0027t group high order atomic allocations\n\nGrouping high-order atomic allocations together was intended to allow\nbursty users of atomic allocations to work such as e1000 in situations\nwhere their preallocated buffers were depleted.  This did not work in at\nleast one case with a wireless network adapter needing order-1 allocations\nfrequently.  To resolve that, the free pages used for min_free_kbytes were\nmoved to separate contiguous blocks with the patch\nbias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.\n\nIt is felt that keeping the free pages in the same contiguous blocks should\nbe sufficient for bursty short-lived high-order atomic allocations to\nsucceed, maybe even with the e1000.  Even if there is a failure, increasing\nthe value of min_free_kbytes will free pages as contiguous bloks in\ncontrast to the standard buddy allocator which makes no attempt to keep the\nminimum number of free pages contiguous.\n\nThis patch backs out grouping high order atomic allocations together to\ndetermine if it is really needed or not.  If a new report comes in about\nhigh-order atomic allocations failing, the feature can be reintroduced to\ndetermine if it fixes the problem or not.  As a side-effect, this patch\nreduces by 1 the number of bits required to track the mobility type of\npages within a MAX_ORDER_NR_PAGES block.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ac0e5b7a6b93fb291b01fe1e951e3c16bcdd3503",
      "tree": "732f67c8de6e0d2e001b60c17af9599468b80163",
      "parents": [
        "56fd56b868f19385c50af8941a4c78df433b2d32"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:58 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "remove PAGE_GROUP_BY_MOBILITY\n\nGrouping pages by mobility can be disabled at compile-time. This was\nconsidered undesirable by a number of people. However, in the current stack of\npatches, it is not a simple case of just dropping the configurable patch as it\nwould cause merge conflicts.  This patch backs out the configuration option.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "56fd56b868f19385c50af8941a4c78df433b2d32",
      "tree": "5ea8362e6e141e2d1124d4640811c76489567bc5",
      "parents": [
        "5c0e3066474b57c56ff0d88ca31d95bd14232fee"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:58 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "Bias the location of pages freed for min_free_kbytes in the same MAX_ORDER_NR_PAGES blocks\n\nThe standard buddy allocator always favours the smallest block of pages.\nThe effect of this is that the pages free to satisfy min_free_kbytes tends\nto be preserved since boot time at the same location of memory ffor a very\nlong time and as a contiguous block.  When an administrator sets the\nreserve at 16384 at boot time, it tends to be the same MAX_ORDER blocks\nthat remain free.  This allows the occasional high atomic allocation to\nsucceed up until the point the blocks are split.  In practice, it is\ndifficult to split these blocks but when they do split, the benefit of\nhaving min_free_kbytes for contiguous blocks disappears.  Additionally,\nincreasing min_free_kbytes once the system has been running for some time\nhas no guarantee of creating contiguous blocks.\n\nOn the other hand, CONFIG_PAGE_GROUP_BY_MOBILITY favours splitting large\nblocks when there are no free pages of the appropriate type available.  A\nside-effect of this is that all blocks in memory tends to be used up and\nthe contiguous free blocks from boot time are not preserved like in the\nvanilla allocator.  This can cause a problem if a new caller is unwilling\nto reclaim or does not reclaim for long enough.\n\nA failure scenario was found for a wireless network device allocating\norder-1 atomic allocations but the allocations were not intense or frequent\nenough for a whole block of pages to be preserved for MIGRATE_HIGHALLOC.\nThis was reproduced on a desktop by booting with mem\u003d256mb, forcing the\ndriver to allocate at order-1, running a bittorrent client (downloading a\ndebian ISO) and building a kernel with -j2.\n\nThis patch addresses the problem on the desktop machine booted with\nmem\u003d256mb.  It works by setting aside a reserve of MAX_ORDER_NR_PAGES\nblocks, the number of which depends on the value of min_free_kbytes.  These\nblocks are only fallen back to when there is no other free pages.  Then the\nsmallest possible page is used just like the normal buddy allocator instead\nof the largest possible page to preserve contiguous pages The pages in free\nlists in the reserve blocks are never taken for another migrate type.  The\nresults is that even if min_free_kbytes is set to a low value, contiguous\nblocks will be preserved in the MIGRATE_RESERVE blocks.\n\nThis works better than the vanilla allocator because if min_free_kbytes is\nincreased, a new reserve block will be chosen based on the location of\nreclaimable pages and the block will free up as contiguous pages.  In the\nvanilla allocator, no effort is made to target a block of pages to free as\ncontiguous pages and min_free_kbytes pages are scattered randomly.\n\nThis effect has been observed on the test machine.  min_free_kbytes was set\ninitially low but it was kept as a contiguous free block within\nMIGRATE_RESERVE.  min_free_kbytes was then set to a higher value and over a\nperiod of time, the free blocks were within the reserve and coalescing.\nHow long it takes to free up depends on how quickly LRU is rotating.\nAmusingly, this means that more activity will free the blocks faster.\n\nThis mechanism potentially replaces MIGRATE_HIGHALLOC as it may be more\neffective than grouping contiguous free pages together.  It all depends on\nwhether the number of active atomic high allocations exceeds\nmin_free_kbytes or not.  If the number of active allocations exceeds\nmin_free_kbytes, it\u0027s worth it but maybe in that situation, min_free_kbytes\nshould be set higher.  Once there are no more reports of allocation\nfailures, a patch will be submitted that backs out MIGRATE_HIGHALLOC and\nsee if the reports stay missing.\n\nCredit to Mariusz Kozlowski for discovering the problem, describing the\nfailure scenario and testing patches and scenarios.\n\n[akpm@linux-foundation.org: cleanups]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5c0e3066474b57c56ff0d88ca31d95bd14232fee",
      "tree": "90c963c62891db4a9039e84e615c01408b09c845",
      "parents": [
        "46dafbca2bba811665b01d8cedf911204820623c"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:56 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "Fix corruption of memmap on IA64 SPARSEMEM when mem_section is not a power of 2\n\nThere are problems in the use of SPARSEMEM and pageblock flags that causes\nproblems on ia64.\n\nThe first part of the problem is that units are incorrect in\nSECTION_BLOCKFLAGS_BITS computation.  This results in a map_section\u0027s\nsection_mem_map being treated as part of a bitmap which isn\u0027t good.  This\nwas evident with an invalid virtual address when mem_init attempted to free\nbootmem pages while relinquishing control from the bootmem allocator.\n\nThe second part of the problem occurs because the pageblock flags bitmap is\nbe located with the mem_section.  The SECTIONS_PER_ROOT computation using\nsizeof (mem_section) may not be a power of 2 depending on the size of the\nbitmap.  This renders masks and other such things not power of 2 base.\nThis issue was seen with SPARSEMEM_EXTREME on ia64.  This patch moves the\nbitmap outside of mem_section and uses a pointer instead in the\nmem_section.  The bitmaps are allocated when the section is being\ninitialised.\n\nNote that sparse_early_usemap_alloc() does not use alloc_remap() like\nsparse_early_mem_map_alloc().  The allocation required for the bitmap on\nx86, the only architecture that uses alloc_remap is typically smaller than\na cache line.  alloc_remap() pads out allocations to the cache size which\nwould be a needless waste.\n\nCredit to Bob Picco for identifying the original problem and effecting a\nfix for the SECTION_BLOCKFLAGS_BITS calculation.  Credit to Andy Whitcroft\nfor devising the best way of allocating the bitmaps only when required for\nthe section.\n\n[wli@holomorphy.com: warning fix]\nSigned-off-by: Bob Picco \u003cbob.picco@hp.com\u003e\nSigned-off-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: \"Luck, Tony\" \u003ctony.luck@intel.com\u003e\nSigned-off-by: William Irwin \u003cbill.irwin@oracle.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e010487dbe09d63cf916fd1b119d17abd0f48207",
      "tree": "37c7f36913daf4bc0a68a1d0ba1cc30ee0d4e307",
      "parents": [
        "e12ba74d8ff3e2f73a583500d7095e406df4d093"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:53 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "Group high-order atomic allocations\n\nIn rare cases, the kernel needs to allocate a high-order block of pages\nwithout sleeping.  For example, this is the case with e1000 cards configured\nto use jumbo frames.  Migrating or reclaiming pages in this situation is not\nan option.\n\nThis patch groups these allocations together as much as possible by adding a\nnew MIGRATE_TYPE.  The MIGRATE_HIGHATOMIC type are exactly what they sound\nlike.  Care is taken that pages of other migrate types do not use the same\nblocks as high-order atomic allocations.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e12ba74d8ff3e2f73a583500d7095e406df4d093",
      "tree": "a0d3385b65f0b3e1e00b0bbf11b75e7538a93edb",
      "parents": [
        "c361be55b3128474aa66d31092db330b07539103"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:52 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:43:00 2007 -0700"
      },
      "message": "Group short-lived and reclaimable kernel allocations\n\nThis patch marks a number of allocations that are either short-lived such as\nnetwork buffers or are reclaimable such as inode allocations.  When something\nlike updatedb is called, long-lived and unmovable kernel allocations tend to\nbe spread throughout the address space which increases fragmentation.\n\nThis patch groups these allocations together as much as possible by adding a\nnew MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be\nreclaimed on demand, but not moved.  i.e.  they can be migrated by deleting\nthem and re-reading the information from elsewhere.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b92a6edd4b77a8794adb497280beea5df5e59a14",
      "tree": "396ea5cf2b53fc066e949c443f03747ec868de1e",
      "parents": [
        "535131e6925b4a95f321148ad7293f496e0e58d7"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:50 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:59 2007 -0700"
      },
      "message": "Add a configure option to group pages by mobility\n\nThe grouping mechanism has some memory overhead and a more complex allocation\npath.  This patch allows the strategy to be disabled for small memory systems\nor if it is known the workload is suffering because of the strategy.  It also\nacts to show where the page groupings strategy interacts with the standard\nbuddy allocator.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Joel Schopp \u003cjschopp@austin.ibm.com\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b2a0ac8875a0a3b9f0739b60526f8c5977d2200f",
      "tree": "31826716b3209751a5468b840ff14190b4a5a8a2",
      "parents": [
        "835c134ec4dd755e5c4470af566db226d1e96742"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:48 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:59 2007 -0700"
      },
      "message": "Split the free lists for movable and unmovable allocations\n\nThis patch adds the core of the fragmentation reduction strategy.  It works by\ngrouping pages together based on their ability to migrate or be reclaimed.\nBasically, it works by breaking the list in zone-\u003efree_area list into\nMIGRATE_TYPES number of lists.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "835c134ec4dd755e5c4470af566db226d1e96742",
      "tree": "7bc659978b4fba5089fc820185a8b6f0cc010b08",
      "parents": [
        "954ffcb35f5aca428661d29b96c4eee82b3c19cd"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 16 01:25:47 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:59 2007 -0700"
      },
      "message": "Add a bitmap that is used to track flags affecting a block of pages\n\nHere is the latest revision of the anti-fragmentation patches.  Of particular\nnote in this version is special treatment of high-order atomic allocations.\nCare is taken to group them together and avoid grouping pages of other types\nnear them.  Artifical tests imply that it works.  I\u0027m trying to get the\nhardware together that would allow setting up of a \"real\" test.  If anyone\nalready has a setup and test that can trigger the atomic-allocation problem,\nI\u0027d appreciate a test of these patches and a report.  The second major change\nis that these patches will apply cleanly with patches that implement\nanti-fragmentation through zones.\n\nkernbench shows effectively no performance difference varying between -0.2%\nand +2% on a variety of test machines.  Success rates for huge page allocation\nare dramatically increased.  For example, on a ppc64 machine, the vanilla\nkernel was only able to allocate 1% of memory as a hugepage and this was due\nto a single hugepage reserved as min_free_kbytes.  With these patches applied,\n17% was allocatable as superpages.  With reclaim-related fixes from Andy\nWhitcroft, it was 40% and further reclaim-related improvements should increase\nthis further.\n\nChangelog Since V28\no Group high-order atomic allocations together\no It is no longer required to set min_free_kbytes to 10% of memory. A value\n  of 16384 in most cases will be sufficient\no Now applied with zone-based anti-fragmentation\no Fix incorrect VM_BUG_ON within buffered_rmqueue()\no Reorder the stack so later patches do not back out work from earlier patches\no Fix bug were journal pages were being treated as movable\no Bias placement of non-movable pages to lower PFNs\no More agressive clustering of reclaimable pages in reactions to workloads\n  like updatedb that flood the size of inode caches\n\nChangelog Since V27\n\no Renamed anti-fragmentation to Page Clustering. Anti-fragmentation was giving\n  the mistaken impression that it was the 100% solution for high order\n  allocations. Instead, it greatly increases the chances high-order\n  allocations will succeed and lays the foundation for defragmentation and\n  memory hot-remove to work properly\no Redefine page groupings based on ability to migrate or reclaim instead of\n  basing on reclaimability alone\no Get rid of spurious inits\no Per-cpu lists are no longer split up per-type. Instead the per-cpu list is\n  searched for a page of the appropriate type\no Added more explanation commentary\no Fix up bug in pageblock code where bitmap was used before being initalised\n\nChangelog Since V26\no Fix double init of lists in setup_pageset\n\nChangelog Since V25\no Fix loop order of for_each_rclmtype_order so that order of loop matches args\no gfpflags_to_rclmtype uses gfp_t instead of unsigned long\no Rename get_pageblock_type() to get_page_rclmtype()\no Fix alignment problem in move_freepages()\no Add mechanism for assigning flags to blocks of pages instead of page-\u003eflags\no On fallback, do not examine the preferred list of free pages a second time\n\nThe purpose of these patches is to reduce external fragmentation by grouping\npages of related types together.  When pages are migrated (or reclaimed under\nmemory pressure), large contiguous pages will be freed.\n\nThis patch works by categorising allocations by their ability to migrate;\n\nMovable - The pages may be moved with the page migration mechanism. These are\n\tgenerally userspace pages.\n\nReclaimable - These are allocations for some kernel caches that are\n\treclaimable or allocations that are known to be very short-lived.\n\nUnmovable - These are pages that are allocated by the kernel that\n\tare not trivially reclaimed. For example, the memory allocated for a\n\tloaded module would be in this category. By default, allocations are\n\tconsidered to be of this type\n\nHighAtomic - These are high-order allocations belonging to callers that\n\tcannot sleep or perform any IO. In practice, this is restricted to\n\tjumbo frame allocation for network receive. It is assumed that the\n\tallocations are short-lived\n\nInstead of having one MAX_ORDER-sized array of free lists in struct free_area,\nthere is one for each type of reclaimability.  Once a 2^MAX_ORDER block of\npages is split for a type of allocation, it is added to the free-lists for\nthat type, in effect reserving it.  Hence, over time, pages of the different\ntypes can be clustered together.\n\nWhen the preferred freelists are expired, the largest possible block is taken\nfrom an alternative list.  Buddies that are split from that large block are\nplaced on the preferred allocation-type freelists to mitigate fragmentation.\n\nThis implementation gives best-effort for low fragmentation in all zones.\nIdeally, min_free_kbytes needs to be set to a value equal to 4 * (1 \u003c\u003c\n(MAX_ORDER-1)) pages in most cases.  This would be 16384 on x86 and x86_64 for\nexample.\n\nOur tests show that about 60-70% of physical memory can be allocated on a\ndesktop after a few days uptime.  In benchmarks and stress tests, we are\nfinding that 80% of memory is available as contiguous blocks at the end of the\ntest.  To compare, a standard kernel was getting \u003c 1% of memory as large pages\non a desktop and about 8-12% of memory as large pages at the end of stress\ntests.\n\nFollowing this email are 12 patches that implement thie page grouping feature.\n The first patch introduces a mechanism for storing flags related to a whole\nblock of pages.  Then allocations are split between movable and all other\nallocations.  Following that are patches to deal with per-cpu pages and make\nthe mechanism configurable.  The next patch moves free pages between lists\nwhen partially allocated blocks are used for pages of another migrate type.\nThe second last patch groups reclaimable kernel allocations such as inode\ncaches together.  The final patch related to groupings keeps high-order atomic\nallocations.\n\nThe last two patches are more concerned with control of fragmentation.  The\nsecond last patch biases placement of non-movable allocations towards the\nstart of memory.  This is with a view of supporting memory hot-remove of DIMMs\nwith higher PFNs in the future.  The biasing could be enforced a lot heavier\nbut it would cost.  The last patch agressively clusters reclaimable pages like\ninode caches together.\n\nThe fragmentation reduction strategy needs to track if pages within a block\ncan be moved or reclaimed so that pages are freed to the appropriate list.\nThis patch adds a bitmap for flags affecting a whole a MAX_ORDER block of\npages.\n\nIn non-SPARSEMEM configurations, the bitmap is stored in the struct zone and\nallocated during initialisation.  SPARSEMEM statically allocates the bitmap in\na struct mem_section so that bitmaps do not have to be resized during memory\nhotadd.  This wastes a small amount of memory per unused section (usually\nsizeof(unsigned long)) but the complexity of dynamically allocating the memory\nis quite high.\n\nAdditional credit to Andy Whitcroft who reviewed up an earlier implementation\nof the mechanism an suggested how to make it a *lot* cleaner.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "523b945855a1427000ffc707c610abe5947ae607",
      "tree": "2d84b5b6822a2a20bfd79146c08ce06ac8c80b9b",
      "parents": [
        "633c0666b5a5c41c376a5a7e4304d638dc48c1b9"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Tue Oct 16 01:25:37 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:59 2007 -0700"
      },
      "message": "Memoryless nodes: Fix GFP_THISNODE behavior\n\nGFP_THISNODE checks that the zone selected is within the pgdat (node) of the\nfirst zone of a nodelist.  That only works if the node has memory.  A\nmemoryless node will have its first node on another pgdat (node).\n\nGFP_THISNODE currently will return simply memory on the first pgdat.  Thus it\nis returning memory on other nodes.  GFP_THISNODE should fail if there is no\nlocal memory on a node.\n\nAdd a new set of zonelists for each node that only contain the nodes that\nbelong to the zones itself so that no fallback is possible.\n\nThen modify gfp_type to pickup the right zone based on the presence of\n__GFP_THISNODE.\n\nDrop the existing GFP_THISNODE checks from the page_allocators hot path.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nAcked-by: Nishanth Aravamudan \u003cnacc@us.ibm.com\u003e\nTested-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Bob Picco \u003cbob.picco@hp.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@skynet.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "540557b9439ec19668553830c90222f9fb0c2e95",
      "tree": "07dfa0e88580d4101dbb11ebc59348233e18b2f0",
      "parents": [
        "cd881a6b22902b356cacf8fd2e4e895871068eec"
      ],
      "author": {
        "name": "Andy Whitcroft",
        "email": "apw@shadowen.org",
        "time": "Tue Oct 16 01:24:11 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Oct 16 09:42:51 2007 -0700"
      },
      "message": "sparsemem: record when a section has a valid mem_map\n\nWe have flags to indicate whether a section actually has a valid mem_map\nassociated with it.  This is never set and we rely solely on the present bit\nto indicate a section is valid.  By definition a section is not valid if it\nhas no mem_map and there is a window during init where the present bit is set\nbut there is no mem_map, during which pfn_valid() will return true\nincorrectly.\n\nUse the existing SECTION_HAS_MEM_MAP flag to indicate the presence of a valid\nmem_map.  Switch valid_section{,_nr} and pfn_valid() to this bit.  Add a new\npresent_section{,_nr} and pfn_present() interfaces for those users who care to\nknow that a section is going to be valid.\n\n[akpm@linux-foundation.org: coding-syle fixes]\nSigned-off-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: \"Luck, Tony\" \u003ctony.luck@intel.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: \"David S. Miller\" \u003cdavem@davemloft.net\u003e\nCc: Paul Mackerras \u003cpaulus@samba.org\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b377fd3982ad957c796758a90e2988401a884241",
      "tree": "3d7449ccdf7038bffffa9323873f4095cc1ac6ce",
      "parents": [
        "8e92f21ba3ea3f54e4be062b87ef9fc4af2d33e2"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Wed Aug 22 14:02:05 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed Aug 22 19:52:47 2007 -0700"
      },
      "message": "Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE\n\nThe NUMA layer only supports NUMA policies for the highest zone.  When\nZONE_MOVABLE is configured with kernelcore\u003d, the the highest zone becomes\nZONE_MOVABLE.  The result is that policies are only applied to allocations\nlike anonymous pages and page cache allocated from ZONE_MOVABLE when the\nzone is used.\n\nThis patch applies policies to the two highest zones when the highest zone\nis ZONE_MOVABLE.  As ZONE_MOVABLE consists of pages from the highest \"real\"\nzone, it\u0027s always functionally equivalent.\n\nThe patch has been tested on a variety of machines both NUMA and non-NUMA\ncovering x86, x86_64 and ppc64.  No abnormal results were seen in\nkernbench, tbench, dbench or hackbench.  It passes regression tests from\nthe numactl package with and without kernelcore\u003d once numactl tests are\npatched to wait for vmstat counters to update.\n\nakpm: this is the nasty hack to fix NUMA mempolicies in the presence of\nZONE_MOVABLE and kernelcore\u003d in 2.6.23.  Christoph says \"For .24 either merge\nthe mobility or get the other solution that Mel is working on.  That solution\nwould only use a single zonelist per node and filter on the fly.  That may\nhelp performance and also help to make memory policies work better.\"\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by:  Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nTested-by:  Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "99eb8a550dbccc0e1f6c7e866fe421810e0585f6",
      "tree": "130c6e3338a0655ba74355eba83afab9261e1ed0",
      "parents": [
        "0d0ed42e5ca2e22465c591341839c18025748fe8"
      ],
      "author": {
        "name": "Adrian Bunk",
        "email": "bunk@stusta.de",
        "time": "Tue Jul 31 00:38:19 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Jul 31 15:39:39 2007 -0700"
      },
      "message": "Remove the arm26 port\n\nThe arm26 port has been in a state where it was far from even compiling\nfor quite some time.\n\nIan Molton agreed with the removal.\n\nSigned-off-by: Adrian Bunk \u003cbunk@stusta.de\u003e\nCc: Ian Molton \u003cspyro@f2s.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5ad333eb66ff1e52a87639822ae088577669dcf9",
      "tree": "addae6bbd19585f19328f309924d06d647e8f2b7",
      "parents": [
        "7e63efef857575320fb413fbc3d0ee704b72845f"
      ],
      "author": {
        "name": "Andy Whitcroft",
        "email": "apw@shadowen.org",
        "time": "Tue Jul 17 04:03:16 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Jul 17 10:22:59 2007 -0700"
      },
      "message": "Lumpy Reclaim V4\n\nWhen we are out of memory of a suitable size we enter reclaim.  The current\nreclaim algorithm targets pages in LRU order, which is great for fairness at\norder-0 but highly unsuitable if you desire pages at higher orders.  To get\npages of higher order we must shoot down a very high proportion of memory;\n\u003e95% in a lot of cases.\n\nThis patch set adds a lumpy reclaim algorithm to the allocator.  It targets\ngroups of pages at the specified order anchored at the end of the active and\ninactive lists.  This encourages groups of pages at the requested orders to\nmove from active to inactive, and active to free lists.  This behaviour is\nonly triggered out of direct reclaim when higher order pages have been\nrequested.\n\nThis patch set is particularly effective when utilised with an\nanti-fragmentation scheme which groups pages of similar reclaimability\ntogether.\n\nThis patch set is based on Peter Zijlstra\u0027s lumpy reclaim V2 patch which forms\nthe foundation.  Credit to Mel Gorman for sanitity checking.\n\nMel said:\n\n  The patches have an application with hugepage pool resizing.\n\n  When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can\n  be resized with greater reliability.  Testing on a desktop machine with 2GB\n  of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it\u0027s own\n  was very slow as the success rate was quite low.  Without lumpy-reclaim,\n  each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages.\n  With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.\n\n[akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]\n[bunk@stusta.de: static declarations for internal functions]\n[a.p.zijlstra@chello.nl: initial lumpy V2 implementation]\nSigned-off-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nAcked-by: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Bob Picco \u003cbob.picco@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "2a1e274acf0b1c192face19a4be7c12d4503eaaf",
      "tree": "f7e98e1fe19d38bb10bf178fb8f8ed1789b659b2",
      "parents": [
        "769848c03895b63e5662eb7e4ec8c4866f7d0183"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Jul 17 04:03:12 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Tue Jul 17 10:22:59 2007 -0700"
      },
      "message": "Create the ZONE_MOVABLE zone\n\nThe following 8 patches against 2.6.20-mm2 create a zone called ZONE_MOVABLE\nthat is only usable by allocations that specify both __GFP_HIGHMEM and\n__GFP_MOVABLE.  This has the effect of keeping all non-movable pages within a\nsingle memory partition while allowing movable allocations to be satisfied\nfrom either partition.  The patches may be applied with the list-based\nanti-fragmentation patches that groups pages together based on mobility.\n\nThe size of the zone is determined by a kernelcore\u003d parameter specified at\nboot-time.  This specifies how much memory is usable by non-movable\nallocations and the remainder is used for ZONE_MOVABLE.  Any range of pages\nwithin ZONE_MOVABLE can be released by migrating the pages or by reclaiming.\n\nWhen selecting a zone to take pages from for ZONE_MOVABLE, there are two\nthings to consider.  First, only memory from the highest populated zone is\nused for ZONE_MOVABLE.  On the x86, this is probably going to be ZONE_HIGHMEM\nbut it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64.  Second,\nthe amount of memory usable by the kernel will be spread evenly throughout\nNUMA nodes where possible.  If the nodes are not of equal size, the amount of\nmemory usable by the kernel on some nodes may be greater than others.\n\nBy default, the zone is not as useful for hugetlb allocations because they are\npinned and non-migratable (currently at least).  A sysctl is provided that\nallows huge pages to be allocated from that zone.  This means that the huge\npage pool can be resized to the size of ZONE_MOVABLE during the lifetime of\nthe system assuming that pages are not mlocked.  Despite huge pages being\nnon-movable, we do not introduce additional external fragmentation of note as\nhuge pages are always the largest contiguous block we care about.\n\nCredit goes to Andy Whitcroft for catching a large variety of problems during\nreview of the patches.\n\nThis patch creates an additional zone, ZONE_MOVABLE.  This zone is only usable\nby allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE.  Hot-added\nmemory continues to be placed in their existing destination as there is no\nmechanism to redirect them to a specific zone.\n\n[y-goto@jp.fujitsu.com: Fix section mismatch of memory hotplug related code]\n[akpm@linux-foundation.org: various fixes]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Yasunori Goto \u003cy-goto@jp.fujitsu.com\u003e\nCc: William Lee Irwin III \u003cwli@holomorphy.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f0c0b2b808f232741eadac272bd4bc51f18df0f4",
      "tree": "c2568efdc496cc165a4e72d8aa2542b22035e342",
      "parents": [
        "18a8bd949d6adb311ea816125ff65050df1f3f6e"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Sun Jul 15 23:38:01 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Mon Jul 16 09:05:35 2007 -0700"
      },
      "message": "change zonelist order: zonelist order selection logic\n\nMake zonelist creation policy selectable from sysctl/boot option v6.\n\nThis patch makes NUMA\u0027s zonelist (of pgdat) order selectable.\nAvailable order are Default(automatic)/ Node-based / Zone-based.\n\n[Default Order]\nThe kernel selects Node-based or Zone-based order automatically.\n\n[Node-based Order]\nThis policy treats the locality of memory as the most important parameter.\nZonelist order is created by each zone\u0027s locality. This means lower zones\n(ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.\nIOW. ZONE_DMA will be in the middle of zonelist.\ncurrent 2.6.21 kernel uses this.\n\nPros.\n * A user can expect local memory as much as possible.\nCons.\n * lower zone will be exhansted before higher zone. This may cause OOM_KILL.\n\nMaybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL\nbecause of ZONE_DMA exhaution and you need the best locality.\n\n(example)\nassume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.\n\n*node(0)\u0027s memory allocation order:\n\n node(0)\u0027s NORMAL -\u003e node(0)\u0027s DMA -\u003e node(1)\u0027s NORMAL.\n\n*node(1)\u0027s memory allocation order:\n\n node(1)\u0027s NORMAL -\u003e node(0)\u0027s NORMAL -\u003e node(0)\u0027s DMA.\n\n[Zone-based order]\nThis policy treats the zone type as the most important parameter.\nZonelist order is created by zone-type order. This means lower zone\nnever be used bofere higher zone exhaustion.\nIOW. ZONE_DMA will be always at the tail of zonelist.\n\nPros.\n * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.\nCons.\n * memory locality may not be best.\n\n(example)\nassume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.\n\n*node(0)\u0027s memory allocation order:\n\n node(0)\u0027s NORMAL -\u003e node(1)\u0027s NORMAL -\u003e node(0)\u0027s DMA.\n\n*node(1)\u0027s memory allocation order:\n\n node(1)\u0027s NORMAL -\u003e node(0)\u0027s NORMAL -\u003e node(0)\u0027s DMA.\n\nbootoption \"numa_zonelist_order\u003d\" and proc/sysctl is supporetd.\n\ncommand:\n%echo N \u003e /proc/sys/vm/numa_zonelist_order\n\nWill rebuild zonelist in Node-based order.\n\ncommand:\n%echo Z \u003e /proc/sys/vm/numa_zonelist_order\n\nWill rebuild zonelist in Zone-based order.\n\nThanks to Lee Schermerhorn, he gives me much help and codes.\n\n[Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]\n[akpm@linux-foundation.org: build fix]\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: \"jesse.barnes@intel.com\" \u003cjesse.barnes@intel.com\u003e\nSigned-off-by: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4037d452202e34214e8a939fa5621b2b3bbb45b7",
      "tree": "31b59c0ca94fba4d53b6738b0bad3d1e9fde3063",
      "parents": [
        "77461ab33229d48614402decfb1b2eaa6d446861"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Wed May 09 02:35:14 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Wed May 09 12:30:56 2007 -0700"
      },
      "message": "Move remote node draining out of slab allocators\n\nCurrently the slab allocators contain callbacks into the page allocator to\nperform the draining of pagesets on remote nodes.  This requires SLUB to have\na whole subsystem in order to be compatible with SLAB.  Moving node draining\nout of the slab allocators avoids a section of code in SLUB.\n\nMove the node draining so that is is done when the vm statistics are updated.\nAt that point we are already touching all the cachelines with the pagesets of\na processor.\n\nAdd a expire counter there.  If we have to update per zone or global vm\nstatistics then assume that the pageset will require subsequent draining.\n\nThe expire counter will be decremented on each vm stats update pass until it\nreaches zero.  Then we will drain one batch from the pageset.  The draining\nwill cause vm counter updates which will then cause another expiration until\nthe pcp is empty.  So we will drain a batch every 3 seconds.\n\nNote that remote node draining is a somewhat esoteric feature that is required\non large NUMA systems because otherwise significant portions of system memory\ncan become trapped in pcp queues.  The number of pcp is determined by the\nnumber of processors and nodes in a system.  A system with 4 processors and 2\nnodes has 8 pcps which is okay.  But a system with 1024 processors and 512\nnodes has 512k pcps with a high potential for large amount of memory being\ncaught in them.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "14e072984179d3d421bf9ab75cc67e0961742841",
      "tree": "65a5a6f7d9756b8e7010278b58908d04da257a28",
      "parents": [
        "ac267728f13c55017ed5ee243c9c3166e27ab929"
      ],
      "author": {
        "name": "Andy Whitcroft",
        "email": "apw@shadowen.org",
        "time": "Sun May 06 14:49:14 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Mon May 07 12:12:52 2007 -0700"
      },
      "message": "add pfn_valid_within helper for sub-MAX_ORDER hole detection\n\nGenerally we work under the assumption that memory the mem_map array is\ncontigious and valid out to MAX_ORDER_NR_PAGES block of pages, ie.  that if we\nhave validated any page within this MAX_ORDER_NR_PAGES block we need not check\nany other.  This is not true when CONFIG_HOLES_IN_ZONE is set and we must\ncheck each and every reference we make from a pfn.\n\nAdd a pfn_valid_within() helper which should be used when scanning pages\nwithin a MAX_ORDER_NR_PAGES block when we have already checked the validility\nof the block normally with pfn_valid().  This can then be optimised away when\nwe do not have holes within a MAX_ORDER_NR_PAGES block of pages.\n\nSigned-off-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Bob Picco \u003cbob.picco@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4b51d66989218aad731a721b5b28c79bf5388c09",
      "tree": "8ff7acbd219f699c20c2f1fd201ffb3db5a64062",
      "parents": [
        "66701b1499a3ff11882c8c4aef36e8eac86e17b1"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Sat Feb 10 01:43:10 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Sun Feb 11 10:51:18 2007 -0800"
      },
      "message": "[PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM\n\nMake ZONE_DMA optional in core code.\n\n- ifdef all code for ZONE_DMA and related definitions following the example\n  for ZONE_DMA32 and ZONE_HIGHMEM.\n\n- Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of\n  0.\n\n- Modify the VM statistics to work correctly without a DMA zone.\n\n- Modify slab to not create DMA slabs if there is no ZONE_DMA.\n\n[akpm@osdl.org: cleanup]\n[jdike@addtoit.com: build fix]\n[apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT]\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nCc: Andi Kleen \u003cak@suse.de\u003e\nCc: \"Luck, Tony\" \u003ctony.luck@intel.com\u003e\nCc: Kyle McMartin \u003ckyle@mcmartin.ca\u003e\nCc: Matthew Wilcox \u003cwilly@debian.org\u003e\nCc: James Bottomley \u003cJames.Bottomley@steeleye.com\u003e\nCc: Paul Mundt \u003clethal@linux-sh.org\u003e\nSigned-off-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Jeff Dike \u003cjdike@addtoit.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "05a0416be2b88d859efcbc4a4290555a04d169a1",
      "tree": "da7216a3a04625a45b952ea21f817d5cdb199530",
      "parents": [
        "9195481d2f869a2707a272057f3f8664fd277534"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Sat Feb 10 01:43:05 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Sun Feb 11 10:51:18 2007 -0800"
      },
      "message": "[PATCH] Drop __get_zone_counts()\n\nValues are readily available via ZVC per node and global sums.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "51ed4491271be8c56bdb2a03481ed34ea4984bc2",
      "tree": "580e03859b7c78a05a6ed479957cd3a1d846c5da",
      "parents": [
        "d23ad42324cc4378132e51f2fc5c9ba6cbe75182"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Sat Feb 10 01:43:02 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Sun Feb 11 10:51:17 2007 -0800"
      },
      "message": "[PATCH] Reorder ZVCs according to cacheline\n\nThe global and per zone counter sums are in arrays of longs.  Reorder the ZVCs\nso that the most frequently used ZVCs are put into the same cacheline.  That\nway calculations of the global, node and per zone vm state touches only a\nsingle cacheline.  This is mostly important for 64 bit systems were one 128\nbyte cacheline takes only 8 longs.\n\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d23ad42324cc4378132e51f2fc5c9ba6cbe75182",
      "tree": "6844416befb3988e432e8f422f3a369e2f760d39",
      "parents": [
        "c878538598d1e7ab41ecc0de8894e34e2fdef630"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Sat Feb 10 01:43:02 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Sun Feb 11 10:51:17 2007 -0800"
      },
      "message": "[PATCH] Use ZVC for free_pages\n\nThis is again simplifies some of the VM counter calculations through the use\nof the ZVC consolidated counters.\n\n[michal.k.k.piotrowski@gmail.com: build fix]\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Michal Piotrowski \u003cmichal.k.k.piotrowski@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "c878538598d1e7ab41ecc0de8894e34e2fdef630",
      "tree": "d22e73fddef75521e287c3e7754a1d3224c348d9",
      "parents": [
        "c3704ceb4ad055b489b143f4e37c57d128908012"
      ],
      "author": {
        "name": "Christoph Lameter",
        "email": "clameter@sgi.com",
        "time": "Sat Feb 10 01:43:01 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Sun Feb 11 10:51:17 2007 -0800"
      },
      "message": "[PATCH] Use ZVC for inactive and active counts\n\nThe determination of the dirty ratio to determine writeback behavior is\ncurrently based on the number of total pages on the system.\n\nHowever, not all pages in the system may be dirtied.  Thus the ratio is always\ntoo low and can never reach 100%.  The ratio may be particularly skewed if\nlarge hugepage allocations, slab allocations or device driver buffers make\nlarge sections of memory not available anymore.  In that case we may get into\na situation in which f.e.  the background writeback ratio of 40% cannot be\nreached anymore which leads to undesired writeback behavior.\n\nThis patchset fixes that issue by determining the ratio based on the actual\npages that may potentially be dirty.  These are the pages on the active and\nthe inactive list plus free pages.\n\nThe problem with those counts has so far been that it is expensive to\ncalculate these because counts from multiple nodes and multiple zones will\nhave to be summed up.  This patchset makes these counters ZVC counters.  This\nmeans that a current sum per zone, per node and for the whole system is always\navailable via global variables and not expensive anymore to calculate.\n\nThe patchset results in some other good side effects:\n\n- Removal of the various functions that sum up free, active and inactive\n  page counts\n\n- Cleanup of the functions that display information via the proc filesystem.\n\nThis patch:\n\nThe use of a ZVC for nr_inactive and nr_active allows a simplification of some\ncounter operations.  More ZVC functionality is used for sums etc in the\nfollowing patches.\n\n[akpm@osdl.org: UP build fix]\nSigned-off-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a2f3aa02576632cdb60bd3de1f4bf55e9ac65604",
      "tree": "2b9b73675de73866fbd219fab5bf2d804e6817b1",
      "parents": [
        "47a4d5be7c50b2e9b905abbe2b97dc87051c5a44"
      ],
      "author": {
        "name": "Dave Hansen",
        "email": "haveblue@us.ibm.com",
        "time": "Wed Jan 10 23:15:30 2007 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Jan 11 18:18:20 2007 -0800"
      },
      "message": "[PATCH] Fix sparsemem on Cell\n\nFix an oops experienced on the Cell architecture when init-time functions,\nearly_*(), are called at runtime.  It alters the call paths to make sure\nthat the callers explicitly say whether the call is being made on behalf of\na hotplug even, or happening at boot-time.\n\nIt has been compile tested on ppc64, ia64, s390, i386 and x86_64.\n\nAcked-by: Arnd Bergmann \u003carndb@de.ibm.com\u003e\nSigned-off-by: Dave Hansen \u003chaveblue@us.ibm.com\u003e\nCc: Yasunori Goto \u003cy-goto@jp.fujitsu.com\u003e\nAcked-by: Andy Whitcroft \u003capw@shadowen.org\u003e\nCc: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nCc: Martin Schwidefsky \u003cschwidefsky@de.ibm.com\u003e\nAcked-by: Heiko Carstens \u003cheiko.carstens@de.ibm.com\u003e\nCc: Benjamin Herrenschmidt \u003cbenh@kernel.crashing.org\u003e\nCc: Paul Mackerras \u003cpaulus@samba.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "15ad7cdcfd76450d4beebc789ec646664238184d",
      "tree": "279d05a76ae0906c23ee2de8c5684d95d9886ad3",
      "parents": [
        "4a08a9f68168e547c2baf100020e9b96cae5fbd1"
      ],
      "author": {
        "name": "Helge Deller",
        "email": "deller@gmx.de",
        "time": "Wed Dec 06 20:40:36 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Dec 07 08:39:46 2006 -0800"
      },
      "message": "[PATCH] struct seq_operations and struct file_operations constification\n\n - move some file_operations structs into the .rodata section\n\n - move static strings from policy_types[] array into the .rodata section\n\n - fix generic seq_operations usages, so that those structs may be defined\n   as \"const\" as well\n\n[akpm@osdl.org: couple of fixes]\nSigned-off-by: Helge Deller \u003cdeller@gmx.de\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "7253f4ef04b1cd138baf2b29a95473743ac0a307",
      "tree": "5883e6773a3cdad31992539ba3ad989d2566a041",
      "parents": [
        "9276b1bc96a132f4068fdee00983c532f43d3a26"
      ],
      "author": {
        "name": "Paul Jackson",
        "email": "pj@sgi.com",
        "time": "Wed Dec 06 20:31:49 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Dec 07 08:39:20 2006 -0800"
      },
      "message": "[PATCH] memory page_alloc zonelist caching reorder structure\n\nRearrange the struct members in the \u0027struct zonelist_cache\u0027 structure, so\nas to put the readonly (once initialized) z_to_n[] array first, where it\nwill come right after the zones[] array in struct zonelist.\n\nThis pretty much eliminates the chance that the two frequently written\nelements of \u0027struct zonelist_cache\u0027, the fullzones bitmap and last_full_zap\ntimes, will end up on the same cache line as the performance sensitive,\nfrequently read, never (after init) written zones[] array.\n\nKeeping frequently written data off frequently read cache lines is good for\nperformance.\n\nThanks to Rohit Seth for the suggestion.\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Rohit Seth \u003crohitseth@google.com\u003e\nCc: Paul Menage \u003cmenage@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "9276b1bc96a132f4068fdee00983c532f43d3a26",
      "tree": "04d64444cf6558632cfc7514b5437578b5e616af",
      "parents": [
        "89689ae7f95995723fbcd5c116c47933a3bb8b13"
      ],
      "author": {
        "name": "Paul Jackson",
        "email": "pj@sgi.com",
        "time": "Wed Dec 06 20:31:48 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.osdl.org",
        "time": "Thu Dec 07 08:39:20 2006 -0800"
      },
      "message": "[PATCH] memory page_alloc zonelist caching speedup\n\nOptimize the critical zonelist scanning for free pages in the kernel memory\nallocator by caching the zones that were found to be full recently, and\nskipping them.\n\nRemembers the zones in a zonelist that were short of free memory in the\nlast second.  And it stashes a zone-to-node table in the zonelist struct,\nto optimize that conversion (minimize its cache footprint.)\n\nRecent changes:\n\n    This differs in a significant way from a similar patch that I\n    posted a week ago.  Now, instead of having a nodemask_t of\n    recently full nodes, I have a bitmask of recently full zones.\n    This solves a problem that last weeks patch had, which on\n    systems with multiple zones per node (such as DMA zone) would\n    take seeing any of these zones full as meaning that all zones\n    on that node were full.\n\n    Also I changed names - from \"zonelist faster\" to \"zonelist cache\",\n    as that seemed to better convey what we\u0027re doing here - caching\n    some of the key zonelist state (for faster access.)\n\n    See below for some performance benchmark results.  After all that\n    discussion with David on why I didn\u0027t need them, I went and got\n    some ;).  I wanted to verify that I had not hurt the normal case\n    of memory allocation noticeably.  At least for my one little\n    microbenchmark, I found (1) the normal case wasn\u0027t affected, and\n    (2) workloads that forced scanning across multiple nodes for\n    memory improved up to 10% fewer System CPU cycles and lower\n    elapsed clock time (\u0027sys\u0027 and \u0027real\u0027).  Good.  See details, below.\n\n    I didn\u0027t have the logic in get_page_from_freelist() for various\n    full nodes and zone reclaim failures correct.  That should be\n    fixed up now - notice the new goto labels zonelist_scan,\n    this_zone_full, and try_next_zone, in get_page_from_freelist().\n\nThere are two reasons I persued this alternative, over some earlier\nproposals that would have focused on optimizing the fake numa\nemulation case by caching the last useful zone:\n\n 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)\n    have seen real customer loads where the cost to scan the zonelist\n    was a problem, due to many nodes being full of memory before\n    we got to a node we could use.  Or at least, I think we have.\n    This was related to me by another engineer, based on experiences\n    from some time past.  So this is not guaranteed.  Most likely, though.\n\n    The following approach should help such real numa systems just as\n    much as it helps fake numa systems, or any combination thereof.\n\n 2) The effort to distinguish fake from real numa, using node_distance,\n    so that we could cache a fake numa node and optimize choosing\n    it over equivalent distance fake nodes, while continuing to\n    properly scan all real nodes in distance order, was going to\n    require a nasty blob of zonelist and node distance munging.\n\n    The following approach has no new dependency on node distances or\n    zone sorting.\n\nSee comment in the patch below for a description of what it actually does.\n\nTechnical details of note (or controversy):\n\n - See the use of \"zlc_active\" and \"did_zlc_setup\" below, to delay\n   adding any work for this new mechanism until we\u0027ve looked at the\n   first zone in zonelist.  I figured the odds of the first zone\n   having the memory we needed were high enough that we should just\n   look there, first, then get fancy only if we need to keep looking.\n\n - Some odd hackery was needed to add items to struct zonelist, while\n   not tripping up the custom zonelists built by the mm/mempolicy.c\n   code for MPOL_BIND.  My usual wordy comments below explain this.\n   Search for \"MPOL_BIND\".\n\n - Some per-node data in the struct zonelist is now modified frequently,\n   with no locking.  Multiple CPU cores on a node could hit and mangle\n   this data.  The theory is that this is just performance hint data,\n   and the memory allocator will work just fine despite any such mangling.\n   The fields at risk are the struct \u0027zonelist_cache\u0027 fields \u0027fullzones\u0027\n   (a bitmask) and \u0027last_full_zap\u0027 (unsigned long jiffies).  It should\n   all be self correcting after at most a one second delay.\n\n - This still does a linear scan of the same lengths as before.  All\n   I\u0027ve optimized is making the scan faster, not algorithmically\n   shorter.  It is now able to scan a compact array of \u0027unsigned\n   short\u0027 in the case of many full nodes, so one cache line should\n   cover quite a few nodes, rather than each node hitting another\n   one or two new and distinct cache lines.\n\n - If both Andi and Nick don\u0027t find this too complicated, I will be\n   (pleasantly) flabbergasted.\n\n - I removed the comment claiming we only use one cachline\u0027s worth of\n   zonelist.  We seem, at least in the fake numa case, to have put the\n   lie to that claim.\n\n - I pay no attention to the various watermarks and such in this performance\n   hint.  A node could be marked full for one watermark, and then skipped\n   over when searching for a page using a different watermark.  I think\n   that\u0027s actually quite ok, as it will tend to slightly increase the\n   spreading of memory over other nodes, away from a memory stressed node.\n\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nPerformance - some benchmark results and analysis:\n\nThis benchmark runs a memory hog program that uses multiple\nthreads to touch alot of memory as quickly as it can.\n\nMultiple runs were made, touching 12, 38, 64 or 90 GBytes out of\nthe total 96 GBytes on the system, and using 1, 19, 37, or 55\nthreads (on a 56 CPU system.)  System, user and real (elapsed)\ntimings were recorded for each run, shown in units of seconds,\nin the table below.\n\nTwo kernels were tested - 2.6.18-mm3 and the same kernel with\nthis zonelist caching patch added.  The table also shows the\npercentage improvement the zonelist caching sys time is over\n(lower than) the stock *-mm kernel.\n\n      number     2.6.18-mm3\t   zonelist-cache    delta (\u003c 0 good)\tpercent\n GBs    N  \t------------\t   --------------    ----------------\tsystime\n mem threads   sys user  real\t  sys  user  real     sys  user  real\t better\n  12\t 1     153   24   177\t  151\t 24   176      -2     0    -1\t   1%\n  12\t19\t99   22     8\t   99\t 22\t8\t0     0     0\t   0%\n  12\t37     111   25     6\t  112\t 25\t6\t1     0     0\t  -0%\n  12\t55     115   25     5\t  110\t 23\t5      -5    -2     0\t   4%\n  38\t 1     502   74   576\t  497\t 73   570      -5    -1    -6\t   0%\n  38\t19     426   78    48\t  373\t 76    39     -53    -2    -9\t  12%\n  38\t37     544   83    36\t  547\t 82    36\t3    -1     0\t  -0%\n  38\t55     501   77    23\t  511\t 80    24      10     3     1\t  -1%\n  64\t 1     917  125  1042\t  890\t124  1014     -27    -1   -28\t   2%\n  64\t19    1118  138   119\t  965\t141   103    -153     3   -16\t  13%\n  64\t37    1202  151    94\t 1136\t150    81     -66    -1   -13\t   5%\n  64\t55    1118  141    61\t 1072\t140    58     -46    -1    -3\t   4%\n  90\t 1    1342  177  1519\t 1275\t174  1450     -67    -3   -69\t   4%\n  90\t19    2392  199   192\t 2116\t189   176    -276   -10   -16\t  11%\n  90\t37    3313  238   175\t 2972\t225   145    -341   -13   -30\t  10%\n  90\t55    1948  210   104\t 1843\t213   100    -105     3    -4\t   5%\n\nNotes:\n 1) This test ran a memory hog program that started a specified number N of\n    threads, and had each thread allocate and touch 1/N\u0027th of\n    the total memory to be used in the test run in a single loop,\n    writing a constant word to memory, one store every 4096 bytes.\n    Watching this test during some earlier trial runs, I would see\n    each of these threads sit down on one CPU and stay there, for\n    the remainder of the pass, a different CPU for each thread.\n\n 2) The \u0027real\u0027 column is not comparable to the \u0027sys\u0027 or \u0027user\u0027 columns.\n    The \u0027real\u0027 column is seconds wall clock time elapsed, from beginning\n    to end of that test pass.  The \u0027sys\u0027 and \u0027user\u0027 columns are total\n    CPU seconds spent on that test pass.  For a 19 thread test run,\n    for example, the sum of \u0027sys\u0027 and \u0027user\u0027 could be up to 19 times the\n    number of \u0027real\u0027 elapsed wall clock seconds.\n\n 3) Tests were run on a fresh, single-user boot, to minimize the amount\n    of memory already in use at the start of the test, and to minimize\n    the amount of background activity that might interfere.\n\n 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM.\n\n 5) Notice that the \u0027real\u0027 time gets large for the single thread runs, even\n    though the measured \u0027sys\u0027 and \u0027user\u0027 times are modest.  I\u0027m not sure what\n    that means - probably something to do with it being slow for one thread to\n    be accessing memory along ways away.  Perhaps the fake numa system, running\n    ostensibly the same workload, would not show this substantial degradation\n    of \u0027real\u0027 time for one thread on many nodes -- lets hope not.\n\n 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs)\n    ran quite efficiently, as one might expect.  Each pair of threads needed\n    to allocate and touch the memory on the node the two threads shared, a\n    pleasantly parallizable workload.\n\n 7) The intermediate thread count passes, when asking for alot of memory forcing\n    them to go to a few neighboring nodes, improved the most with this zonelist\n    caching patch.\n\nConclusions:\n * This zonelist cache patch probably makes little difference one way or the\n   other for most workloads on real numa hardware, if those workloads avoid\n   heavy off node allocations.\n * For memory intensive workloads requiring substantial off-node allocations\n   on real numa hardware, this patch improves both kernel and elapsed timings\n   up to ten per-cent.\n * For fake numa systems, I\u0027m optimistic, but will have to leave that up to\n   Rohit Seth to actually test (once I get him a 2.6.18 backport.)\n\nSigned-off-by: Paul Jackson \u003cpj@sgi.com\u003e\nCc: Rohit Seth \u003crohitseth@google.com\u003e\nCc: Christoph Lameter \u003cclameter@engr.sgi.com\u003e\nCc: David Rientjes \u003crientjes@cs.washington.edu\u003e\nCc: Paul Menage \u003cmenage@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    }
  ],
  "next": "3bb1a852ab6c9cdf211a2f4a2f502340c8c38eca"
}
