)]}'
{
  "log": [
    {
      "commit": "208bca0860406d16398145ddd950036a737c3c9d",
      "tree": "7797a16c17d8bd155120126fa7976727fc6de013",
      "parents": [
        "6aad3738f6a79fd0ca480eaceefe064cc471f6eb",
        "0e175a1835ffc979e55787774e58ec79e41957d7"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Nov 06 19:02:23 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Nov 06 19:02:23 2011 -0800"
      },
      "message": "Merge branch \u0027writeback-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux\n\n* \u0027writeback-for-linus\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:\n  writeback: Add a \u0027reason\u0027 to wb_writeback_work\n  writeback: send work item to queue_io, move_expired_inodes\n  writeback: trace event balance_dirty_pages\n  writeback: trace event bdi_dirty_ratelimit\n  writeback: fix ppc compile warnings on do_div(long long, unsigned long)\n  writeback: per-bdi background threshold\n  writeback: dirty position control - bdi reserve area\n  writeback: control dirty pause time\n  writeback: limit max dirty pause time\n  writeback: IO-less balance_dirty_pages()\n  writeback: per task dirty rate limit\n  writeback: stabilize bdi-\u003edirty_ratelimit\n  writeback: dirty rate control\n  writeback: add bg_threshold parameter to __bdi_update_bandwidth()\n  writeback: dirty position control\n  writeback: account per-bdi accumulated dirtied pages\n"
    },
    {
      "commit": "9b272977e3b99a8699361d214b51f98c8a9e0e7b",
      "tree": "2113cee95a42ea893aa6eddb01b14e563153fabb",
      "parents": [
        "0a619e58703b86d53d07e938eade9a91a4a863c6"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "jweiner@redhat.com",
        "time": "Wed Nov 02 13:38:23 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Nov 02 16:07:00 2011 -0700"
      },
      "message": "memcg: skip scanning active lists based on individual size\n\nReclaim decides to skip scanning an active list when the corresponding\ninactive list is above a certain size in comparison to leave the assumed\nworking set alone while there are still enough reclaim candidates around.\n\nThe memcg implementation of comparing those lists instead reports whether\nthe whole memcg is low on the requested type of inactive pages,\nconsidering all nodes and zones.\n\nThis can lead to an oversized active list not being scanned because of the\nstate of the other lists in the memcg, as well as an active list being\nscanned while its corresponding inactive list has enough pages.\n\nNot only is this wrong, it\u0027s also a scalability hazard, because the global\nmemory state over all nodes and zones has to be gathered for each memcg\nand zone scanned.\n\nMake these calculations purely based on the size of the two LRU lists\nthat are actually affected by the outcome of the decision.\n\nSigned-off-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Balbir Singh \u003cbsingharora@gmail.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: Ying Han \u003cyinghan@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e0c23279c9f800c403f37511484d9014ac83adec",
      "tree": "9dcf058d3d1c691328ea5839dfe9c340e47ee3fa",
      "parents": [
        "e0887c19b2daa140f20ca8104bdc5740f39dbb86"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Mon Oct 31 17:09:33 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:50 2011 -0700"
      },
      "message": "vmscan: abort reclaim/compaction if compaction can proceed\n\nIf compaction can proceed, shrink_zones() stops doing any work but its\ncallers still call shrink_slab() which raises the priority and potentially\nsleeps.  This is unnecessary and wasteful so this patch aborts direct\nreclaim/compaction entirely if compaction can proceed.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Josh Boyer \u003cjwboyer@redhat.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e0887c19b2daa140f20ca8104bdc5740f39dbb86",
      "tree": "86330414eb04b5989e68661c205aa52d46ca7ebf",
      "parents": [
        "21ee9f398be209ccbb62929d35961ca1ed48eec3"
      ],
      "author": {
        "name": "Rik van Riel",
        "email": "riel@redhat.com",
        "time": "Mon Oct 31 17:09:31 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:50 2011 -0700"
      },
      "message": "vmscan: limit direct reclaim for higher order allocations\n\nWhen suffering from memory fragmentation due to unfreeable pages, THP page\nfaults will repeatedly try to compact memory.  Due to the unfreeable\npages, compaction fails.\n\nNeedless to say, at that point page reclaim also fails to create free\ncontiguous 2MB areas.  However, that doesn\u0027t stop the current code from\ntrying, over and over again, and freeing a minimum of 4MB (2UL \u003c\u003c\nsc-\u003eorder pages) at every single invocation.\n\nThis resulted in my 12GB system having 2-3GB free memory, a corresponding\namount of used swap and very sluggish response times.\n\nThis can be avoided by having the direct reclaim code not reclaim from\nzones that already have plenty of free memory available for compaction.\n\nIf compaction still fails due to unmovable memory, doing additional\nreclaim will only hurt the system, not help.\n\n[jweiner@redhat.com: change comment to explain the order check]\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "21ee9f398be209ccbb62929d35961ca1ed48eec3",
      "tree": "4127e14f4a07a2cb7bc6eb902e9c7b0baab8e84f",
      "parents": [
        "2f1da6421570d064a94e17190a4955c2df99794d"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Mon Oct 31 17:09:28 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:50 2011 -0700"
      },
      "message": "vmscan: add barrier to prevent evictable page in unevictable list\n\nWhen a race between putback_lru_page() and shmem_lock with lock\u003d0 happens,\nprogrom execution order is as follows, but clear_bit in processor #1 could\nbe reordered right before spin_unlock of processor #1.  Then, the page\nwould be stranded on the unevictable list.\n\nspin_lock\nSetPageLRU\nspin_unlock\n                                clear_bit(AS_UNEVICTABLE)\n                                spin_lock\n                                if PageLRU()\n                                        if !test_bit(AS_UNEVICTABLE)\n                                        \tmove evictable list\nsmp_mb\nif !test_bit(AS_UNEVICTABLE)\n        move evictable list\n                                spin_unlock\n\nBut, pagevec_lookup() in scan_mapping_unevictable_pages() has\nrcu_read_[un]lock() so it could protect reordering before reaching\ntest_bit(AS_UNEVICTABLE) on processor #1 so this problem never happens.\nBut it\u0027s a unexpected side effect and we should solve this problem\nproperly.\n\nThis patch adds a barrier after mapping_clear_unevictable.\n\nI didn\u0027t meet this problem but just found during review.\n\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nAcked-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "264e56d8247ef6e31ed4386926cae86c61ddcb18",
      "tree": "87e85ee670fb7ae4c0cd7bdeae700faff021bf48",
      "parents": [
        "3f380998aeb51b99d5d22cadb41162e1e9db70d2"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "jweiner@redhat.com",
        "time": "Mon Oct 31 17:09:13 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:49 2011 -0700"
      },
      "message": "mm: disable user interface to manually rescue unevictable pages\n\nAt one point, anonymous pages were supposed to go on the unevictable list\nwhen no swap space was configured, and the idea was to manually rescue\nthose pages after adding swap and making them evictable again.  But\nnowadays, swap-backed pages on the anon LRU list are not scanned without\navailable swap space anyway, so there is no point in moving them to a\nseparate list anymore.\n\nThe manual rescue could also be used in case pages were stranded on the\nunevictable list due to race conditions.  But the code has been around for\na while now and newly discovered bugs should be properly reported and\ndealt with instead of relying on such a manual fixup.\n\nIn addition to the lack of a usecase, the sysfs interface to rescue pages\nfrom a specific NUMA node has been broken since its introduction, so it\u0027s\nunlikely that anybody ever relied on that.\n\nThis patch removes the functionality behind the sysctl and the\nnode-interface and emits a one-time warning when somebody tries to access\neither of them.\n\nSigned-off-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nReported-by: Kautuk Consul \u003cconsul.kautuk@gmail.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3f380998aeb51b99d5d22cadb41162e1e9db70d2",
      "tree": "1d5bb20368b06c7b86e7257771209795bd764f00",
      "parents": [
        "4e9dc5df46001510ebd3b3e54faa650f474e51a3"
      ],
      "author": {
        "name": "Kautuk Consul",
        "email": "consul.kautuk@gmail.com",
        "time": "Mon Oct 31 17:09:11 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:49 2011 -0700"
      },
      "message": "vmscan.c: fix invalid strict_strtoul() check in write_scan_unevictable_node()\n\nwrite_scan_unevictable_node() checks the value req returned by\nstrict_strtoul() and returns 1 if req is 0.\n\nHowever, when strict_strtoul() returns 0, it means successful conversion\nof buf to unsigned long.\n\nDue to this, the function was not proceeding to scan the zones for\nunevictable pages even though we write a valid value to the\nscan_unevictable_pages sys file.\n\nChange this check slightly to check for invalid value in buf as well as 0\nvalue stored in res after successful conversion via strict_strtoul.  In\nboth cases, we do not perform the scanning of this node\u0027s zones.\n\nSigned-off-by: Kautuk Consul \u003cconsul.kautuk@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f0dfcde099453aa4c0dc42473828d15a6d492936",
      "tree": "50346f4e8d9a773f266194661edea903a855a110",
      "parents": [
        "d1f0ece6cdca973c01a46dff0eb062baafe78a85"
      ],
      "author": {
        "name": "Alex,Shi",
        "email": "alex.shi@intel.com",
        "time": "Mon Oct 31 17:08:45 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:48 2011 -0700"
      },
      "message": "kswapd: assign new_order and new_classzone_idx after wakeup in sleeping\n\nThere 2 places to read pgdat in kswapd.  One is return from a successful\nbalance, another is waked up from kswapd sleeping.  The new_order and\nnew_classzone_idx represent the balance input order and classzone_idx.\n\nBut current new_order and new_classzone_idx are not assigned after\nkswapd_try_to_sleep(), that will cause a bug in the following scenario.\n\n1: after a successful balance, kswapd goes to sleep, and new_order \u003d 0;\n   new_classzone_idx \u003d __MAX_NR_ZONES - 1;\n\n2: kswapd waked up with order \u003d 3 and classzone_idx \u003d ZONE_NORMAL\n\n3: in the balance_pgdat() running, a new balance wakeup happened with\n   order \u003d 5, and classzone_idx \u003d ZONE_NORMAL\n\n4: the first wakeup(order \u003d 3) finished successufly, return order \u003d 3\n   but, the new_order is still 0, so, this balancing will be treated as a\n   failed balance.  And then the second tighter balancing will be missed.\n\nSo, to avoid the above problem, the new_order and new_classzone_idx need\nto be assigned for later successful comparison.\n\nSigned-off-by: Alex Shi \u003calex.shi@intel.com\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nTested-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d2ebd0f6b89567eb93ead4e2ca0cbe03021f344b",
      "tree": "1f8fc3f7702a8b4d362f5537e135e96f86043f8d",
      "parents": [
        "64212ec569bfdd094f7a23d9b09862209a983559"
      ],
      "author": {
        "name": "Alex,Shi",
        "email": "alex.shi@intel.com",
        "time": "Mon Oct 31 17:08:39 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:48 2011 -0700"
      },
      "message": "kswapd: avoid unnecessary rebalance after an unsuccessful balancing\n\nIn commit 215ddd66 (\"mm: vmscan: only read new_classzone_idx from pgdat\nwhen reclaiming successfully\") , Mel Gorman said kswapd is better to sleep\nafter a unsuccessful balancing if there is tighter reclaim request pending\nin the balancing.  But in the following scenario, kswapd do something that\nis not matched our expectation.  The patch fixes this issue.\n\n1, Read pgdat request A (classzone_idx, order \u003d 3)\n2, balance_pgdat()\n3, During pgdat, a new pgdat request B (classzone_idx, order \u003d 5) is placed\n4, balance_pgdat() returns but failed since returned order \u003d 0\n5, pgdat of request A assigned to balance_pgdat(), and do balancing again.\n   While the expectation behavior of kswapd should try to sleep.\n\nSigned-off-by: Alex Shi \u003calex.shi@intel.com\u003e\nReviewed-by: Tim Chen \u003ctim.c.chen@linux.intel.com\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nTested-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "16fb951237c2b0b28037b992ee44e7ee401c30d1",
      "tree": "756ba52239d304d8f45deb102f17960f0a8517ec",
      "parents": [
        "49ea7eb65e7c5060807fb9312b1ad4c3eab82e2c"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Mon Oct 31 17:08:02 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:47 2011 -0700"
      },
      "message": "vmscan: count pages into balanced for zone with good watermark\n\nIt\u0027s possible a zone watermark is ok when entering the balance_pgdat()\nloop, while the zone is within the requested classzone_idx.  Count pages\nfrom this zone into `balanced\u0027.  In this way, we can skip shrinking zones\ntoo much for high order allocation.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "49ea7eb65e7c5060807fb9312b1ad4c3eab82e2c",
      "tree": "88eaa206cdcac1190817820a0eb56bca2585f9ea",
      "parents": [
        "92df3a723f84cdf8133560bbff950a7a99e92bc9"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Mon Oct 31 17:07:59 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:47 2011 -0700"
      },
      "message": "mm: vmscan: immediately reclaim end-of-LRU dirty pages when writeback completes\n\nWhen direct reclaim encounters a dirty page, it gets recycled around the\nLRU for another cycle.  This patch marks the page PageReclaim similar to\ndeactivate_page() so that the page gets reclaimed almost immediately after\nthe page gets cleaned.  This is to avoid reclaiming clean pages that are\nyounger than a dirty page encountered at the end of the LRU that might\nhave been something like a use-once page.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nAcked-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Jan Kara \u003cjack@suse.cz\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Alex Elder \u003caelder@sgi.com\u003e\nCc: Theodore Ts\u0027o \u003ctytso@mit.edu\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "92df3a723f84cdf8133560bbff950a7a99e92bc9",
      "tree": "503efc278236d877508da66ea7ec7cbb81203d64",
      "parents": [
        "f84f6e2b0868f198f97a32ba503d6f9f319a249a"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Mon Oct 31 17:07:56 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:46 2011 -0700"
      },
      "message": "mm: vmscan: throttle reclaim if encountering too many dirty pages under writeback\n\nWorkloads that are allocating frequently and writing files place a large\nnumber of dirty pages on the LRU.  With use-once logic, it is possible for\nthem to reach the end of the LRU quickly requiring the reclaimer to scan\nmore to find clean pages.  Ordinarily, processes that are dirtying memory\nwill get throttled by dirty balancing but this is a global heuristic and\ndoes not take into account that LRUs are maintained on a per-zone basis.\nThis can lead to a situation whereby reclaim is scanning heavily, skipping\nover a large number of pages under writeback and recycling them around the\nLRU consuming CPU.\n\nThis patch checks how many of the number of pages isolated from the LRU\nwere dirty and under writeback.  If a percentage of them under writeback,\nthe process will be throttled if a backing device or the zone is\ncongested.  Note that this applies whether it is anonymous or file-backed\npages that are under writeback meaning that swapping is potentially\nthrottled.  This is intentional due to the fact if the swap device is\ncongested, scanning more pages and dispatching more IO is not going to\nhelp matters.\n\nThe percentage that must be in writeback depends on the priority.  At\ndefault priority, all of them must be dirty.  At DEF_PRIORITY-1, 50% of\nthem must be, DEF_PRIORITY-2, 25% etc.  i.e.  as pressure increases the\ngreater the likelihood the process will get throttled to allow the flusher\nthreads to make some progress.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Jan Kara \u003cjack@suse.cz\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Alex Elder \u003caelder@sgi.com\u003e\nCc: Theodore Ts\u0027o \u003ctytso@mit.edu\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f84f6e2b0868f198f97a32ba503d6f9f319a249a",
      "tree": "2103afe0304afd0045dca4b92dfd35922cfc289b",
      "parents": [
        "966dbde2c208e07bab7a45a7855e1e693eabe661"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Mon Oct 31 17:07:51 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:46 2011 -0700"
      },
      "message": "mm: vmscan: do not writeback filesystem pages in kswapd except in high priority\n\nIt is preferable that no dirty pages are dispatched for cleaning from the\npage reclaim path.  At normal priorities, this patch prevents kswapd\nwriting pages.\n\nHowever, page reclaim does have a requirement that pages be freed in a\nparticular zone.  If it is failing to make sufficient progress (reclaiming\n\u003c SWAP_CLUSTER_MAX at any priority priority), the priority is raised to\nscan more pages.  A priority of DEF_PRIORITY - 3 is considered to be the\npoint where kswapd is getting into trouble reclaiming pages.  If this\npriority is reached, kswapd will dispatch pages for writing.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Jan Kara \u003cjack@suse.cz\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Alex Elder \u003caelder@sgi.com\u003e\nCc: Theodore Ts\u0027o \u003ctytso@mit.edu\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a18bba061c789f5815c3efc3c80e6ac269911964",
      "tree": "bec0234fb338f8e06b6e39df2cfa09acf2a968a3",
      "parents": [
        "ee72886d8ed5d9de3fa0ed3b99a7ca7702576a96"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Mon Oct 31 17:07:42 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:46 2011 -0700"
      },
      "message": "mm: vmscan: remove dead code related to lumpy reclaim waiting on pages under writeback\n\nLumpy reclaim worked with two passes - the first which queued pages for IO\nand the second which waited on writeback.  As direct reclaim can no longer\nwrite pages there is some dead code.  This patch removes it but direct\nreclaim will continue to wait on pages under writeback while in\nsynchronous reclaim mode.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Jan Kara \u003cjack@suse.cz\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Alex Elder \u003caelder@sgi.com\u003e\nCc: Theodore Ts\u0027o \u003ctytso@mit.edu\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ee72886d8ed5d9de3fa0ed3b99a7ca7702576a96",
      "tree": "d9596005d3ea38541c5dfe1c2a0b7d5a4d73488f",
      "parents": [
        "e10d59f2c3decaf22cc5d3de7040eba202bc2df3"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Mon Oct 31 17:07:38 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:46 2011 -0700"
      },
      "message": "mm: vmscan: do not writeback filesystem pages in direct reclaim\n\nTesting from the XFS folk revealed that there is still too much I/O from\nthe end of the LRU in kswapd.  Previously it was considered acceptable by\nVM people for a small number of pages to be written back from reclaim with\ntesting generally showing about 0.3% of pages reclaimed were written back\n(higher if memory was low).  That writing back a small number of pages is\nok has been heavily disputed for quite some time and Dave Chinner\nexplained it well;\n\n\tIt doesn\u0027t have to be a very high number to be a problem. IO\n\tis orders of magnitude slower than the CPU time it takes to\n\tflush a page, so the cost of making a bad flush decision is\n\tvery high. And single page writeback from the LRU is almost\n\talways a bad flush decision.\n\nTo complicate matters, filesystems respond very differently to requests\nfrom reclaim according to Christoph Hellwig;\n\n\txfs tries to write it back if the requester is kswapd\n\text4 ignores the request if it\u0027s a delayed allocation\n\tbtrfs ignores the request\n\nAs a result, each filesystem has different performance characteristics\nwhen under memory pressure and there are many pages being dirtied.  In\nsome cases, the request is ignored entirely so the VM cannot depend on the\nIO being dispatched.\n\nThe objective of this series is to reduce writing of filesystem-backed\npages from reclaim, play nicely with writeback that is already in progress\nand throttle reclaim appropriately when writeback pages are encountered.\nThe assumption is that the flushers will always write pages faster than if\nreclaim issues the IO.\n\nA secondary goal is to avoid the problem whereby direct reclaim splices\ntwo potentially deep call stacks together.\n\nThere is a potential new problem as reclaim has less control over how long\nbefore a page in a particularly zone or container is cleaned and direct\nreclaimers depend on kswapd or flusher threads to do the necessary work.\nHowever, as filesystems sometimes ignore direct reclaim requests already,\nit is not expected to be a serious issue.\n\nPatch 1 disables writeback of filesystem pages from direct reclaim\n\tentirely. Anonymous pages are still written.\n\nPatch 2 removes dead code in lumpy reclaim as it is no longer able\n\tto synchronously write pages. This hurts lumpy reclaim but\n\tthere is an expectation that compaction is used for hugepage\n\tallocations these days and lumpy reclaim\u0027s days are numbered.\n\nPatches 3-4 add warnings to XFS and ext4 if called from\n\tdirect reclaim. With patch 1, this \"never happens\" and is\n\tintended to catch regressions in this logic in the future.\n\nPatch 5 disables writeback of filesystem pages from kswapd unless\n\tthe priority is raised to the point where kswapd is considered\n\tto be in trouble.\n\nPatch 6 throttles reclaimers if too many dirty pages are being\n\tencountered and the zones or backing devices are congested.\n\nPatch 7 invalidates dirty pages found at the end of the LRU so they\n\tare reclaimed quickly after being written back rather than\n\twaiting for a reclaimer to find them\n\nI consider this series to be orthogonal to the writeback work but it is\nworth noting that the writeback work affects the viability of patch 8 in\nparticular.\n\nI tested this on ext4 and xfs using fs_mark, a simple writeback test based\non dd and a micro benchmark that does a streaming write to a large mapping\n(exercises use-once LRU logic) followed by streaming writes to a mix of\nanonymous and file-backed mappings.  The command line for fs_mark when\nbotted with 512M looked something like\n\n./fs_mark -d  /tmp/fsmark-2676  -D  100  -N  150  -n  150  -L  25  -t  1  -S0  -s  10485760\n\nThe number of files was adjusted depending on the amount of available\nmemory so that the files created was about 3xRAM.  For multiple threads,\nthe -d switch is specified multiple times.\n\nThe test machine is x86-64 with an older generation of AMD processor with\n4 cores.  The underlying storage was 4 disks configured as RAID-0 as this\nwas the best configuration of storage I had available.  Swap is on a\nseparate disk.  Dirty ratio was tuned to 40% instead of the default of\n20%.\n\nTesting was run with and without monitors to both verify that the patches\nwere operating as expected and that any performance gain was real and not\ndue to interference from monitors.\n\nHere is a summary of results based on testing XFS.\n\n512M1P-xfs           Files/s  mean                 32.69 ( 0.00%)     34.44 ( 5.08%)\n512M1P-xfs           Elapsed Time fsmark                    51.41     48.29\n512M1P-xfs           Elapsed Time simple-wb                114.09    108.61\n512M1P-xfs           Elapsed Time mmap-strm                113.46    109.34\n512M1P-xfs           Kswapd efficiency fsmark                 62%       63%\n512M1P-xfs           Kswapd efficiency simple-wb              56%       61%\n512M1P-xfs           Kswapd efficiency mmap-strm              44%       42%\n512M-xfs             Files/s  mean                 30.78 ( 0.00%)     35.94 (14.36%)\n512M-xfs             Elapsed Time fsmark                    56.08     48.90\n512M-xfs             Elapsed Time simple-wb                112.22     98.13\n512M-xfs             Elapsed Time mmap-strm                219.15    196.67\n512M-xfs             Kswapd efficiency fsmark                 54%       56%\n512M-xfs             Kswapd efficiency simple-wb              54%       55%\n512M-xfs             Kswapd efficiency mmap-strm              45%       44%\n512M-4X-xfs          Files/s  mean                 30.31 ( 0.00%)     33.33 ( 9.06%)\n512M-4X-xfs          Elapsed Time fsmark                    63.26     55.88\n512M-4X-xfs          Elapsed Time simple-wb                100.90     90.25\n512M-4X-xfs          Elapsed Time mmap-strm                261.73    255.38\n512M-4X-xfs          Kswapd efficiency fsmark                 49%       50%\n512M-4X-xfs          Kswapd efficiency simple-wb              54%       56%\n512M-4X-xfs          Kswapd efficiency mmap-strm              37%       36%\n512M-16X-xfs         Files/s  mean                 60.89 ( 0.00%)     65.22 ( 6.64%)\n512M-16X-xfs         Elapsed Time fsmark                    67.47     58.25\n512M-16X-xfs         Elapsed Time simple-wb                103.22     90.89\n512M-16X-xfs         Elapsed Time mmap-strm                237.09    198.82\n512M-16X-xfs         Kswapd efficiency fsmark                 45%       46%\n512M-16X-xfs         Kswapd efficiency simple-wb              53%       55%\n512M-16X-xfs         Kswapd efficiency mmap-strm              33%       33%\n\nUp until 512-4X, the FSmark improvements were statistically significant.\nFor the 4X and 16X tests the results were within standard deviations but\njust barely.  The time to completion for all tests is improved which is an\nimportant result.  In general, kswapd efficiency is not affected by\nskipping dirty pages.\n\n1024M1P-xfs          Files/s  mean                 39.09 ( 0.00%)     41.15 ( 5.01%)\n1024M1P-xfs          Elapsed Time fsmark                    84.14     80.41\n1024M1P-xfs          Elapsed Time simple-wb                210.77    184.78\n1024M1P-xfs          Elapsed Time mmap-strm                162.00    160.34\n1024M1P-xfs          Kswapd efficiency fsmark                 69%       75%\n1024M1P-xfs          Kswapd efficiency simple-wb              71%       77%\n1024M1P-xfs          Kswapd efficiency mmap-strm              43%       44%\n1024M-xfs            Files/s  mean                 35.45 ( 0.00%)     37.00 ( 4.19%)\n1024M-xfs            Elapsed Time fsmark                    94.59     91.00\n1024M-xfs            Elapsed Time simple-wb                229.84    195.08\n1024M-xfs            Elapsed Time mmap-strm                405.38    440.29\n1024M-xfs            Kswapd efficiency fsmark                 79%       71%\n1024M-xfs            Kswapd efficiency simple-wb              74%       74%\n1024M-xfs            Kswapd efficiency mmap-strm              39%       42%\n1024M-4X-xfs         Files/s  mean                 32.63 ( 0.00%)     35.05 ( 6.90%)\n1024M-4X-xfs         Elapsed Time fsmark                   103.33     97.74\n1024M-4X-xfs         Elapsed Time simple-wb                204.48    178.57\n1024M-4X-xfs         Elapsed Time mmap-strm                528.38    511.88\n1024M-4X-xfs         Kswapd efficiency fsmark                 81%       70%\n1024M-4X-xfs         Kswapd efficiency simple-wb              73%       72%\n1024M-4X-xfs         Kswapd efficiency mmap-strm              39%       38%\n1024M-16X-xfs        Files/s  mean                 42.65 ( 0.00%)     42.97 ( 0.74%)\n1024M-16X-xfs        Elapsed Time fsmark                   103.11     99.11\n1024M-16X-xfs        Elapsed Time simple-wb                200.83    178.24\n1024M-16X-xfs        Elapsed Time mmap-strm                397.35    459.82\n1024M-16X-xfs        Kswapd efficiency fsmark                 84%       69%\n1024M-16X-xfs        Kswapd efficiency simple-wb              74%       73%\n1024M-16X-xfs        Kswapd efficiency mmap-strm              39%       40%\n\nAll FSMark tests up to 16X had statistically significant improvements.\nFor the most part, tests are completing faster with the exception of the\nstreaming writes to a mixture of anonymous and file-backed mappings which\nwere slower in two cases\n\nIn the cases where the mmap-strm tests were slower, there was more\nswapping due to dirty pages being skipped.  The number of additional pages\nswapped is almost identical to the fewer number of pages written from\nreclaim.  In other words, roughly the same number of pages were reclaimed\nbut swapping was slower.  As the test is a bit unrealistic and stresses\nmemory heavily, the small shift is acceptable.\n\n4608M1P-xfs          Files/s  mean                 29.75 ( 0.00%)     30.96 ( 3.91%)\n4608M1P-xfs          Elapsed Time fsmark                   512.01    492.15\n4608M1P-xfs          Elapsed Time simple-wb                618.18    566.24\n4608M1P-xfs          Elapsed Time mmap-strm                488.05    465.07\n4608M1P-xfs          Kswapd efficiency fsmark                 93%       86%\n4608M1P-xfs          Kswapd efficiency simple-wb              88%       84%\n4608M1P-xfs          Kswapd efficiency mmap-strm              46%       45%\n4608M-xfs            Files/s  mean                 27.60 ( 0.00%)     28.85 ( 4.33%)\n4608M-xfs            Elapsed Time fsmark                   555.96    532.34\n4608M-xfs            Elapsed Time simple-wb                659.72    571.85\n4608M-xfs            Elapsed Time mmap-strm               1082.57   1146.38\n4608M-xfs            Kswapd efficiency fsmark                 89%       91%\n4608M-xfs            Kswapd efficiency simple-wb              88%       82%\n4608M-xfs            Kswapd efficiency mmap-strm              48%       46%\n4608M-4X-xfs         Files/s  mean                 26.00 ( 0.00%)     27.47 ( 5.35%)\n4608M-4X-xfs         Elapsed Time fsmark                   592.91    564.00\n4608M-4X-xfs         Elapsed Time simple-wb                616.65    575.07\n4608M-4X-xfs         Elapsed Time mmap-strm               1773.02   1631.53\n4608M-4X-xfs         Kswapd efficiency fsmark                 90%       94%\n4608M-4X-xfs         Kswapd efficiency simple-wb              87%       82%\n4608M-4X-xfs         Kswapd efficiency mmap-strm              43%       43%\n4608M-16X-xfs        Files/s  mean                 26.07 ( 0.00%)     26.42 ( 1.32%)\n4608M-16X-xfs        Elapsed Time fsmark                   602.69    585.78\n4608M-16X-xfs        Elapsed Time simple-wb                606.60    573.81\n4608M-16X-xfs        Elapsed Time mmap-strm               1549.75   1441.86\n4608M-16X-xfs        Kswapd efficiency fsmark                 98%       98%\n4608M-16X-xfs        Kswapd efficiency simple-wb              88%       82%\n4608M-16X-xfs        Kswapd efficiency mmap-strm              44%       42%\n\nUnlike the other tests, the fsmark results are not statistically\nsignificant but the min and max times are both improved and for the most\npart, tests completed faster.\n\nThere are other indications that this is an improvement as well.  For\nexample, in the vast majority of cases, there were fewer pages scanned by\ndirect reclaim implying in many cases that stalls due to direct reclaim\nare reduced.  KSwapd is scanning more due to skipping dirty pages which is\nunfortunate but the CPU usage is still acceptable\n\nIn an earlier set of tests, I used blktrace and in almost all cases\nthroughput throughout the entire test was higher.  However, I ended up\ndiscarding those results as recording blktrace data was too heavy for my\nliking.\n\nOn a laptop, I plugged in a USB stick and ran a similar tests of tests\nusing it as backing storage.  A desktop environment was running and for\nthe entire duration of the tests, firefox and gnome terminal were\nlaunching and exiting to vaguely simulate a user.\n\n1024M-xfs            Files/s  mean               0.41 ( 0.00%)        0.44 ( 6.82%)\n1024M-xfs            Elapsed Time fsmark               2053.52   1641.03\n1024M-xfs            Elapsed Time simple-wb            1229.53    768.05\n1024M-xfs            Elapsed Time mmap-strm            4126.44   4597.03\n1024M-xfs            Kswapd efficiency fsmark              84%       85%\n1024M-xfs            Kswapd efficiency simple-wb           92%       81%\n1024M-xfs            Kswapd efficiency mmap-strm           60%       51%\n1024M-xfs            Avg wait ms fsmark                5404.53     4473.87\n1024M-xfs            Avg wait ms simple-wb             2541.35     1453.54\n1024M-xfs            Avg wait ms mmap-strm             3400.25     3852.53\n\nThe mmap-strm results were hurt because firefox launching had a tendency\nto push the test out of memory.  On the postive side, firefox launched\nmarginally faster with the patches applied.  Time to completion for many\ntests was faster but more importantly - the \"Avg wait\" time as measured by\niostat was far lower implying the system would be more responsive.  It was\nalso the case that \"Avg wait ms\" on the root filesystem was lower.  I\ntested it manually and while the system felt slightly more responsive\nwhile copying data to a USB stick, it was marginal enough that it could be\nmy imagination.\n\nThis patch: do not writeback filesystem pages in direct reclaim.\n\nWhen kswapd is failing to keep zones above the min watermark, a process\nwill enter direct reclaim in the same manner kswapd does.  If a dirty page\nis encountered during the scan, this page is written to backing storage\nusing mapping-\u003ewritepage.\n\nThis causes two problems.  First, it can result in very deep call stacks,\nparticularly if the target storage or filesystem are complex.  Some\nfilesystems ignore write requests from direct reclaim as a result.  The\nsecond is that a single-page flush is inefficient in terms of IO.  While\nthere is an expectation that the elevator will merge requests, this does\nnot always happen.  Quoting Christoph Hellwig;\n\n\tThe elevator has a relatively small window it can operate on,\n\tand can never fix up a bad large scale writeback pattern.\n\nThis patch prevents direct reclaim writing back filesystem pages by\nchecking if current is kswapd.  Anonymous pages are still written to swap\nas there is not the equivalent of a flusher thread for anonymous pages.\nIf the dirty pages cannot be written back, they are placed back on the LRU\nlists.  There is now a direct dependency on dirty page balancing to\nprevent too many pages in the system being dirtied which would prevent\nreclaim making forward progress.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Dave Chinner \u003cdavid@fromorbit.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Jan Kara \u003cjack@suse.cz\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Alex Elder \u003caelder@sgi.com\u003e\nCc: Theodore Ts\u0027o \u003ctytso@mit.edu\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f11c0ca501af89fc07b0d9f17531ba3b68a4ef39",
      "tree": "c66a24b4ca2778b940c01a2af78eca6abc0b3421",
      "parents": [
        "4f31888c104687078f8d88c2f11eca1080c88464"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "jweiner@redhat.com",
        "time": "Mon Oct 31 17:07:27 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:46 2011 -0700"
      },
      "message": "mm: vmscan: drop nr_force_scan[] from get_scan_count\n\nThe nr_force_scan[] tuple holds the effective scan numbers for anon and\nfile pages in case the situation called for a forced scan and the\nregularly calculated scan numbers turned out zero.\n\nHowever, the effective scan number can always be assumed to be\nSWAP_CLUSTER_MAX right before the division into anon and file.  The\nnumerators and denominator are properly set up for all cases, be it force\nscan for just file, just anon, or both, to do the right thing.\n\nSigned-off-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Balbir Singh \u003cbsingharora@gmail.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3da367c3e5fca71d4e778fa565d9b098d5518f4a",
      "tree": "915dff1989bdffaed157b56f724631b5d8f2d328",
      "parents": [
        "3fa36acbced23c563345de3179dfe1775f15be5e"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Mon Oct 31 17:07:03 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:45 2011 -0700"
      },
      "message": "vmscan: add block plug for page reclaim\n\nper-task block plug can reduce block queue lock contention and increase\nrequest merge.  Currently page reclaim doesn\u0027t support it.  I originally\nthought page reclaim doesn\u0027t need it, because kswapd thread count is\nlimited and file cache write is done at flusher mostly.\n\nWhen I test a workload with heavy swap in a 4-node machine, each CPU is\ndoing direct page reclaim and swap.  This causes block queue lock\ncontention.  In my test, without below patch, the CPU utilization is about\n2% ~ 7%.  With the patch, the CPU utilization is about 1% ~ 3%.  Disk\nthroughput isn\u0027t changed.  This should improve normal kswapd write and\nfile cache write too (increase request merge for example), but might not\nbe so obvious as I explain above.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Jens Axboe \u003caxboe@kernel.dk\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f80c0673610e36ae29d63e3297175e22f70dde5f",
      "tree": "0a6aab3b637fa75961224e9261eb544156672c34",
      "parents": [
        "39deaf8585152f1a35c1676d3d7dc6ae0fb65967"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Mon Oct 31 17:06:55 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:44 2011 -0700"
      },
      "message": "mm: zone_reclaim: make isolate_lru_page() filter-aware\n\nIn __zone_reclaim case, we don\u0027t want to shrink mapped page.  Nonetheless,\nwe have isolated mapped page and re-add it into LRU\u0027s head.  It\u0027s\nunnecessary CPU overhead and makes LRU churning.\n\nOf course, when we isolate the page, the page might be mapped but when we\ntry to migrate the page, the page would be not mapped.  So it could be\nmigrated.  But race is rare and although it happens, it\u0027s no big deal.\n\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nReviewed-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "39deaf8585152f1a35c1676d3d7dc6ae0fb65967",
      "tree": "a7509ea61c2f1028ed7ef961aa1abd16d50905f9",
      "parents": [
        "4356f21d09283dc6d39a6f7287a65ddab61e2808"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Mon Oct 31 17:06:51 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:44 2011 -0700"
      },
      "message": "mm: compaction: make isolate_lru_page() filter-aware\n\nIn async mode, compaction doesn\u0027t migrate dirty or writeback pages.  So,\nit\u0027s meaningless to pick the page and re-add it to lru list.\n\nOf course, when we isolate the page in compaction, the page might be dirty\nor writeback but when we try to migrate the page, the page would be not\ndirty, writeback.  So it could be migrated.  But it\u0027s very unlikely as\nisolate and migration cycle is much faster than writeout.\n\nSo, this patch helps cpu overhead and prevent unnecessary LRU churning.\n\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4356f21d09283dc6d39a6f7287a65ddab61e2808",
      "tree": "34822a1662ea83291455834556a4fb5bf98ecd72",
      "parents": [
        "b9e84ac1536d35aee03b2601f19694949f0bd506"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Mon Oct 31 17:06:47 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Oct 31 17:30:44 2011 -0700"
      },
      "message": "mm: change isolate mode from #define to bitwise type\n\nChange ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,\nmacro isn\u0027t recommended as it\u0027s type-unsafe and making debugging harder as\nsymbol cannot be passed throught to the debugger.\n\nQuote from Johannes\n\" Hmm, it would probably be cleaner to fully convert the isolation mode\ninto independent flags.  INACTIVE, ACTIVE, BOTH is currently a\ntri-state among flags, which is a bit ugly.\"\n\nThis patch moves isolate mode from swap.h to mmzone.h by memcontrol.h\n\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0e175a1835ffc979e55787774e58ec79e41957d7",
      "tree": "6ec4b65a8de4e9d1c12d26a1079079ed81d79450",
      "parents": [
        "ad4e38dd6a33bb3a4882c487d7abe621e583b982"
      ],
      "author": {
        "name": "Curt Wohlgemuth",
        "email": "curtw@google.com",
        "time": "Fri Oct 07 21:54:10 2011 -0600"
      },
      "committer": {
        "name": "Wu Fengguang",
        "email": "fengguang.wu@intel.com",
        "time": "Mon Oct 31 00:33:36 2011 +0800"
      },
      "message": "writeback: Add a \u0027reason\u0027 to wb_writeback_work\n\nThis creates a new \u0027reason\u0027 field in a wb_writeback_work\nstructure, which unambiguously identifies who initiates\nwriteback activity.  A \u0027wb_reason\u0027 enumeration has been\nadded to writeback.h, to enumerate the possible reasons.\n\nThe \u0027writeback_work_class\u0027 and tracepoint event class and\n\u0027writeback_queue_io\u0027 tracepoints are updated to include the\nsymbolic \u0027reason\u0027 in all trace events.\n\nAnd the \u0027writeback_inodes_sbXXX\u0027 family of routines has had\na wb_stats parameter added to them, so callers can specify\nwhy writeback is being started.\n\nAcked-by: Jan Kara \u003cjack@suse.cz\u003e\nSigned-off-by: Curt Wohlgemuth \u003ccurtw@google.com\u003e\nSigned-off-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\n"
    },
    {
      "commit": "e060c38434b2caa78efe7cedaff4191040b65a15",
      "tree": "407361230bf6733f63d8e788e4b5e6566ee04818",
      "parents": [
        "10e4ac572eeffe5317019bd7330b6058a400dfc2",
        "cc39c6a9bbdebfcf1a7dee64d83bf302bc38d941"
      ],
      "author": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Thu Sep 15 15:08:05 2011 +0200"
      },
      "committer": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Thu Sep 15 15:08:18 2011 +0200"
      },
      "message": "Merge branch \u0027master\u0027 into for-next\n\nFast-forward merge with Linus to be able to merge patches\nbased on more recent version of the tree.\n"
    },
    {
      "commit": "185efc0f9a1f2d6ad6d4782c5d9e529f3290567f",
      "tree": "9330dac6b7f17fad7d99e444b3544210109e2d99",
      "parents": [
        "a4d3e9e76337059406fcf3ead288c0df22a790e9"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "jweiner@redhat.com",
        "time": "Wed Sep 14 16:21:58 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Sep 14 18:09:38 2011 -0700"
      },
      "message": "memcg: Revert \"memcg: add memory.vmscan_stat\"\n\nRevert the post-3.0 commit 82f9d486e59f5 (\"memcg: add\nmemory.vmscan_stat\").\n\nThe implementation of per-memcg reclaim statistics violates how memcg\nhierarchies usually behave: hierarchically.\n\nThe reclaim statistics are accounted to child memcgs and the parent\nhitting the limit, but not to hierarchy levels in between.  Usually,\nhierarchical statistics are perfectly recursive, with each level\nrepresenting the sum of itself and all its children.\n\nSince this exports statistics to userspace, this may lead to confusion\nand problems with changing things after the release, so revert it now,\nwe can try again later.\n\nSigned-off-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Balbir Singh \u003cbsingharora@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a4d3e9e76337059406fcf3ead288c0df22a790e9",
      "tree": "a0213478caaa3845bf62a4cbe6b65979be5e34b4",
      "parents": [
        "d4c32f355cec2647efb65e4b24e630bd2386f787"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "jweiner@redhat.com",
        "time": "Wed Sep 14 16:21:52 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Sep 14 18:09:37 2011 -0700"
      },
      "message": "mm: vmscan: fix force-scanning small targets without swap\n\nWithout swap, anonymous pages are not scanned.  As such, they should not\ncount when considering force-scanning a small target if there is no swap.\n\nOtherwise, targets are not force-scanned even when their effective scan\nnumber is zero and the other conditions--kswapd/memcg--apply.\n\nThis fixes 246e87a93934 (\"memcg: fix get_scan_count() for small\ntargets\").\n\n[akpm@linux-foundation.org: fix comment]\nSigned-off-by: Johannes Weiner \u003cjweiner@redhat.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Balbir Singh \u003cbsingharora@gmail.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "439423f6894aa0dec22187526827456f5004baed",
      "tree": "1f22238a24b83e0beb1467a1f73b0db952c69894",
      "parents": [
        "4c30c6f566c0989ddaee3407da44751e340a63ed"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Thu Aug 25 15:59:12 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Aug 25 16:25:34 2011 -0700"
      },
      "message": "vmscan: clear ZONE_CONGESTED for zone with good watermark\n\nZONE_CONGESTED is only cleared in kswapd, but pages can be freed in any\ntask.  It\u0027s possible ZONE_CONGESTED isn\u0027t cleared in some cases:\n\n 1. the zone is already balanced just entering balance_pgdat() for\n    order-0 because concurrent tasks free memory.  In this case, later\n    check will skip the zone as it\u0027s balanced so the flag isn\u0027t cleared.\n\n 2. high order balance fallbacks to order-0.  quote from Mel: At the\n    end of balance_pgdat(), kswapd uses the following logic;\n\n\tIf reclaiming at high order {\n\t\tfor each zone {\n\t\t\tif all_unreclaimable\n\t\t\t\tskip\n\t\t\tif watermark is not met\n\t\t\t\torder \u003d 0\n\t\t\t\tloop again\n\n\t\t\t/* watermark is met */\n\t\t\tclear congested\n\t\t}\n\t}\n\n    i.e. it clears ZONE_CONGESTED if it the zone is balanced.  if not,\n    it restarts balancing at order-0.  However, if the higher zones are\n    balanced for order-0, kswapd will miss clearing ZONE_CONGESTED as\n    that only happens after a zone is shrunk.  This can mean that\n    wait_iff_congested() stalls unnecessarily.\n\nThis patch makes kswapd clear ZONE_CONGESTED during its initial\nhighmem-\u003edma scan for zones that are already balanced.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f51bdd2e97098a5cbb3cba7c3a56fa0e9ac3c444",
      "tree": "90419a74be50f313f7fefe99aa7776f8ba832233",
      "parents": [
        "7e8aa048989bf7e0604996a3e2068fb1a81f81bd"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Thu Aug 25 15:59:10 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Aug 25 16:25:34 2011 -0700"
      },
      "message": "mm: fix a vmscan warning\n\nI get the below warning:\n\n  BUG: using smp_processor_id() in preemptible [00000000] code: bash/746\n  caller is native_sched_clock+0x37/0x6e\n  Pid: 746, comm: bash Tainted: G        W   3.0.0+ #254\n  Call Trace:\n   [\u003cffffffff813435c6\u003e] debug_smp_processor_id+0xc2/0xdc\n   [\u003cffffffff8104158d\u003e] native_sched_clock+0x37/0x6e\n   [\u003cffffffff81116219\u003e] try_to_free_mem_cgroup_pages+0x7d/0x270\n   [\u003cffffffff8114f1f8\u003e] mem_cgroup_force_empty+0x24b/0x27a\n   [\u003cffffffff8114ff21\u003e] ? sys_close+0x38/0x138\n   [\u003cffffffff8114ff21\u003e] ? sys_close+0x38/0x138\n   [\u003cffffffff8114f257\u003e] mem_cgroup_force_empty_write+0x17/0x19\n   [\u003cffffffff810c72fb\u003e] cgroup_file_write+0xa8/0xba\n   [\u003cffffffff811522d2\u003e] vfs_write+0xb3/0x138\n   [\u003cffffffff8115241a\u003e] sys_write+0x4a/0x71\n   [\u003cffffffff8114ffd9\u003e] ? sys_close+0xf0/0x138\n   [\u003cffffffff8176deab\u003e] system_call_fastpath+0x16/0x1b\n\nsched_clock() can\u0027t be used with preempt enabled.  And we don\u0027t need\nfast approach to get clock here, so let\u0027s use ktime API.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nTested-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "81d66c70b546e7be5d7e1f1ca9676fd17c5973af",
      "tree": "73b65ca359b47c7952c99df9abebe5922c69ca5c",
      "parents": [
        "52288b6646ee4b1f3b97b5b64e4e84652c2a9bbe"
      ],
      "author": {
        "name": "Justin P. Mattock",
        "email": "justinmattock@gmail.com",
        "time": "Tue Aug 23 09:28:02 2011 -0700"
      },
      "committer": {
        "name": "Jiri Kosina",
        "email": "jkosina@suse.cz",
        "time": "Wed Aug 24 16:45:10 2011 +0200"
      },
      "message": "mm/vmscan.c: fix a typo in a comment \"relaimed\" to \"reclaimed\"\n\nSigned-off-by: Justin P. Mattock \u003cjustinmattock@gmail.com\u003e\nSigned-off-by: Jiri Kosina \u003cjkosina@suse.cz\u003e\n"
    },
    {
      "commit": "82f9d486e59f588c7d100865c36510644abda356",
      "tree": "266f3dcf4f57538196bddd77a129adfb2752335b",
      "parents": [
        "108b6a78463bb8c7163e4f9779f36ad8bbade334"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Jul 26 16:08:26 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jul 26 16:49:42 2011 -0700"
      },
      "message": "memcg: add memory.vmscan_stat\n\nThe commit log of 0ae5e89c60c9 (\"memcg: count the soft_limit reclaim\nin...\") says it adds scanning stats to memory.stat file.  But it doesn\u0027t\nbecause we considered we needed to make a concensus for such new APIs.\n\nThis patch is a trial to add memory.scan_stat. This shows\n  - the number of scanned pages(total, anon, file)\n  - the number of rotated pages(total, anon, file)\n  - the number of freed pages(total, anon, file)\n  - the number of elaplsed time (including sleep/pause time)\n\n  for both of direct/soft reclaim.\n\nThe biggest difference with oringinal Ying\u0027s one is that this file\ncan be reset by some write, as\n\n  # echo 0 ...../memory.scan_stat\n\nExample of output is here. This is a result after make -j 6 kernel\nunder 300M limit.\n\n  [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat\n  [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat\n  scanned_pages_by_limit 9471864\n  scanned_anon_pages_by_limit 6640629\n  scanned_file_pages_by_limit 2831235\n  rotated_pages_by_limit 4243974\n  rotated_anon_pages_by_limit 3971968\n  rotated_file_pages_by_limit 272006\n  freed_pages_by_limit 2318492\n  freed_anon_pages_by_limit 962052\n  freed_file_pages_by_limit 1356440\n  elapsed_ns_by_limit 351386416101\n  scanned_pages_by_system 0\n  scanned_anon_pages_by_system 0\n  scanned_file_pages_by_system 0\n  rotated_pages_by_system 0\n  rotated_anon_pages_by_system 0\n  rotated_file_pages_by_system 0\n  freed_pages_by_system 0\n  freed_anon_pages_by_system 0\n  freed_file_pages_by_system 0\n  elapsed_ns_by_system 0\n  scanned_pages_by_limit_under_hierarchy 9471864\n  scanned_anon_pages_by_limit_under_hierarchy 6640629\n  scanned_file_pages_by_limit_under_hierarchy 2831235\n  rotated_pages_by_limit_under_hierarchy 4243974\n  rotated_anon_pages_by_limit_under_hierarchy 3971968\n  rotated_file_pages_by_limit_under_hierarchy 272006\n  freed_pages_by_limit_under_hierarchy 2318492\n  freed_anon_pages_by_limit_under_hierarchy 962052\n  freed_file_pages_by_limit_under_hierarchy 1356440\n  elapsed_ns_by_limit_under_hierarchy 351386416101\n  scanned_pages_by_system_under_hierarchy 0\n  scanned_anon_pages_by_system_under_hierarchy 0\n  scanned_file_pages_by_system_under_hierarchy 0\n  rotated_pages_by_system_under_hierarchy 0\n  rotated_anon_pages_by_system_under_hierarchy 0\n  rotated_file_pages_by_system_under_hierarchy 0\n  freed_pages_by_system_under_hierarchy 0\n  freed_anon_pages_by_system_under_hierarchy 0\n  freed_file_pages_by_system_under_hierarchy 0\n  elapsed_ns_by_system_under_hierarchy 0\n\ntotal_xxxx is for hierarchy management.\n\nThis will be useful for further memcg developments and need to be\ndevelopped before we do some complicated rework on LRU/softlimit\nmanagement.\n\nThis patch adds a new struct memcg_scanrecord into scan_control struct.\nsc-\u003enr_scanned at el is not designed for exporting information.  For\nexample, nr_scanned is reset frequentrly and incremented +2 at scanning\nmapped pages.\n\nTo avoid complexity, I added a new param in scan_control which is for\nexporting scanning score.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Andrew Bresticker \u003cabrestic@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4508378b9523e22a2a0175d8bf64d932fb10a67d",
      "tree": "f7ebd57322edc18ff91ed7d71236fbbd5633c21c",
      "parents": [
        "1af8efe965676ab30d6c8a5b1fccc9229f339a3b"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Jul 26 16:08:24 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jul 26 16:49:42 2011 -0700"
      },
      "message": "memcg: fix vmscan count in small memcgs\n\nCommit 246e87a93934 (\"memcg: fix get_scan_count() for small targets\")\nfixes the memcg/kswapd behavior against small targets and prevent vmscan\npriority too high.\n\nBut the implementation is too naive and adds another problem to small\nmemcg.  It always force scan to 32 pages of file/anon and doesn\u0027t handle\nswappiness and other rotate_info.  It makes vmscan to scan anon LRU\nregardless of swappiness and make reclaim bad.  This patch fixes it by\nadjusting scanning count with regard to swappiness at el.\n\nAt a test \"cat 1G file under 300M limit.\" (swappiness\u003d20)\n before patch\n        scanned_pages_by_limit 360919\n        scanned_anon_pages_by_limit 180469\n        scanned_file_pages_by_limit 180450\n        rotated_pages_by_limit 31\n        rotated_anon_pages_by_limit 25\n        rotated_file_pages_by_limit 6\n        freed_pages_by_limit 180458\n        freed_anon_pages_by_limit 19\n        freed_file_pages_by_limit 180439\n        elapsed_ns_by_limit 429758872\n after patch\n        scanned_pages_by_limit 180674\n        scanned_anon_pages_by_limit 24\n        scanned_file_pages_by_limit 180650\n        rotated_pages_by_limit 35\n        rotated_anon_pages_by_limit 24\n        rotated_file_pages_by_limit 11\n        freed_pages_by_limit 180634\n        freed_anon_pages_by_limit 0\n        freed_file_pages_by_limit 180634\n        elapsed_ns_by_limit 367119089\n        scanned_pages_by_system 0\n\nthe numbers of scanning anon are decreased(as expected), and elapsed time\nreduced. By this patch, small memcgs will work better.\n(*) Because the amount of file-cache is much bigger than anon,\n    recalaim_stat\u0027s rotate-scan counter make scanning files more.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "bb2a0de92c891b8feeedc0178acb3ae009d899a8",
      "tree": "c2c0b3ad66c8da0e48c021927b2d747fb08b7ef3",
      "parents": [
        "1f4c025b5a5520fd2571244196b1b01ad96d18f6"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Jul 26 16:08:22 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jul 26 16:49:42 2011 -0700"
      },
      "message": "memcg: consolidate memory cgroup lru stat functions\n\nIn mm/memcontrol.c, there are many lru stat functions as..\n\n  mem_cgroup_zone_nr_lru_pages\n  mem_cgroup_node_nr_file_lru_pages\n  mem_cgroup_nr_file_lru_pages\n  mem_cgroup_node_nr_anon_lru_pages\n  mem_cgroup_nr_anon_lru_pages\n  mem_cgroup_node_nr_unevictable_lru_pages\n  mem_cgroup_nr_unevictable_lru_pages\n  mem_cgroup_node_nr_lru_pages\n  mem_cgroup_nr_lru_pages\n  mem_cgroup_get_local_zonestat\n\nSome of them are under #ifdef MAX_NUMNODES \u003e1 and others are not.\nThis seems bad. This patch consolidates all functions into\n\n  mem_cgroup_zone_nr_lru_pages()\n  mem_cgroup_node_nr_lru_pages()\n  mem_cgroup_nr_lru_pages()\n\nFor these functions, \"which LRU?\" information is passed by a mask.\n\nexample:\n  mem_cgroup_nr_lru_pages(mem, BIT(LRU_ACTIVE_ANON))\n\nAnd I added some macro as ALL_LRU, ALL_LRU_FILE, ALL_LRU_ANON.\n\nexample:\n  mem_cgroup_nr_lru_pages(mem, ALL_LRU)\n\nBTW, considering layout of NUMA memory placement of counters, this patch seems\nto be better.\n\nNow, when we gather all LRU information, we scan in following orer\n    for_each_lru -\u003e for_each_node -\u003e for_each_zone.\n\nThis means we\u0027ll touch cache lines in different node in turn.\n\nAfter patch, we\u0027ll scan\n    for_each_node -\u003e for_each_zone -\u003e for_each_lru(mask)\n\nThen, we\u0027ll gather information in the same cacheline at once.\n\n[akpm@linux-foundation.org: fix warnigns, build error]\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Balbir Singh \u003cbsingharora@gmail.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1f4c025b5a5520fd2571244196b1b01ad96d18f6",
      "tree": "f2123a44c9e838eac7d08fe13443cfa04cc23f33",
      "parents": [
        "b830ac1d9a2262093bb0f3f6a2fd2a1c8278daf5"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue Jul 26 16:08:21 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jul 26 16:49:42 2011 -0700"
      },
      "message": "memcg: export memory cgroup\u0027s swappiness with mem_cgroup_swappiness()\n\nEach memory cgroup has a \u0027swappiness\u0027 value which can be accessed by\nget_swappiness(memcg).  The major user is try_to_free_mem_cgroup_pages()\nand swappiness is passed by argument.  It\u0027s propagated by scan_control.\n\nget_swappiness() is a static function but some planned updates will need\nto get swappiness from files other than memcontrol.c This patch exports\nget_swappiness() as mem_cgroup_swappiness().  With this, we can remove the\nargument of swapiness from try_to_free...  and drop swappiness from\nscan_control.  only memcg uses it.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Balbir Singh \u003cbsingharora@gmail.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e9299f5058595a655c3b207cda9635e28b9197e6",
      "tree": "b31a4dc5cab98ee1701313f45e92e583c2d76f63",
      "parents": [
        "3567b59aa80ac4417002bf58e35dce5c777d4164"
      ],
      "author": {
        "name": "Dave Chinner",
        "email": "dchinner@redhat.com",
        "time": "Fri Jul 08 14:14:37 2011 +1000"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Jul 20 01:44:32 2011 -0400"
      },
      "message": "vmscan: add customisable shrinker batch size\n\nFor shrinkers that have their own cond_resched* calls, having\nshrink_slab break the work down into small batches is not\npaticularly efficient. Add a custom batchsize field to the struct\nshrinker so that shrinkers can use a larger batch size if they\ndesire.\n\nA value of zero (uninitialised) means \"use the default\", so\nbehaviour is unchanged by this patch.\n\nSigned-off-by: Dave Chinner \u003cdchinner@redhat.com\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "3567b59aa80ac4417002bf58e35dce5c777d4164",
      "tree": "0d450f4cd1fae3c01cbc5ccaefd4cd519029b0f6",
      "parents": [
        "acf92b485cccf028177f46918e045c0c4e80ee10"
      ],
      "author": {
        "name": "Dave Chinner",
        "email": "dchinner@redhat.com",
        "time": "Fri Jul 08 14:14:36 2011 +1000"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Jul 20 01:44:31 2011 -0400"
      },
      "message": "vmscan: reduce wind up shrinker-\u003enr when shrinker can\u0027t do work\n\nWhen a shrinker returns -1 to shrink_slab() to indicate it cannot do\nany work given the current memory reclaim requirements, it adds the\nentire total_scan count to shrinker-\u003enr. The idea ehind this is that\nwhenteh shrinker is next called and can do work, it will do the work\nof the previously aborted shrinker call as well.\n\nHowever, if a filesystem is doing lots of allocation with GFP_NOFS\nset, then we get many, many more aborts from the shrinkers than we\ndo successful calls. The result is that shrinker-\u003enr winds up to\nit\u0027s maximum permissible value (twice the current cache size) and\nthen when the next shrinker call that can do work is issued, it\nhas enough scan count built up to free the entire cache twice over.\n\nThis manifests itself in the cache going from full to empty in a\nmatter of seconds, even when only a small part of the cache is\nneeded to be emptied to free sufficient memory.\n\nUnder metadata intensive workloads on ext4 and XFS, I\u0027m seeing the\nVFS caches increase memory consumption up to 75% of memory (no page\ncache pressure) over a period of 30-60s, and then the shrinker\nempties them down to zero in the space of 2-3s. This cycle repeats\nover and over again, with the shrinker completely trashing the inode\nand dentry caches every minute or so the workload continues.\n\nThis behaviour was made obvious by the shrink_slab tracepoints added\nearlier in the series, and made worse by the patch that corrected\nthe concurrent accounting of shrinker-\u003enr.\n\nTo avoid this problem, stop repeated small increments of the total\nscan value from winding shrinker-\u003enr up to a value that can cause\nthe entire cache to be freed. We still need to allow it to wind up,\nso use the delta as the \"large scan\" threshold check - if the delta\nis more than a quarter of the entire cache size, then it is a large\nscan and allowed to cause lots of windup because we are clearly\nneeding to free lots of memory.\n\nIf it isn\u0027t a large scan then limit the total scan to half the size\nof the cache so that windup never increases to consume the whole\ncache. Reducing the total scan limit further does not allow enough\nwind-up to maintain the current levels of performance, whilst a\nhigher threshold does not prevent the windup from freeing the entire\ncache under sustained workloads.\n\nSigned-off-by: Dave Chinner \u003cdchinner@redhat.com\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "acf92b485cccf028177f46918e045c0c4e80ee10",
      "tree": "7da97439b5388d143cf414874572f1b2da7f3fe3",
      "parents": [
        "095760730c1047c69159ce88021a7fa3833502c8"
      ],
      "author": {
        "name": "Dave Chinner",
        "email": "dchinner@redhat.com",
        "time": "Fri Jul 08 14:14:35 2011 +1000"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Jul 20 01:44:29 2011 -0400"
      },
      "message": "vmscan: shrinker-\u003enr updates race and go wrong\n\nshrink_slab() allows shrinkers to be called in parallel so the\nstruct shrinker can be updated concurrently. It does not provide any\nexclusio for such updates, so we can get the shrinker-\u003enr value\nincreasing or decreasing incorrectly.\n\nAs a result, when a shrinker repeatedly returns a value of -1 (e.g.\na VFS shrinker called w/ GFP_NOFS), the shrinker-\u003enr goes haywire,\nsometimes updating with the scan count that wasn\u0027t used, sometimes\nlosing it altogether. Worse is when a shrinker does work and that\nupdate is lost due to racy updates, which means the shrinker will do\nthe work again!\n\nFix this by making the total_scan calculations independent of\nshrinker-\u003enr, and making the shrinker-\u003enr updates atomic w.r.t. to\nother updates via cmpxchg loops.\n\nSigned-off-by: Dave Chinner \u003cdchinner@redhat.com\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "095760730c1047c69159ce88021a7fa3833502c8",
      "tree": "7cee34087a3e0d3d14691075b0ea614846c87d7c",
      "parents": [
        "a9049376ee05bf966bfe2b081b5071326856890a"
      ],
      "author": {
        "name": "Dave Chinner",
        "email": "dchinner@redhat.com",
        "time": "Fri Jul 08 14:14:34 2011 +1000"
      },
      "committer": {
        "name": "Al Viro",
        "email": "viro@zeniv.linux.org.uk",
        "time": "Wed Jul 20 01:44:27 2011 -0400"
      },
      "message": "vmscan: add shrink_slab tracepoints\n\nIt is impossible to understand what the shrinkers are actually doing\nwithout instrumenting the code, so add a some tracepoints to allow\ninsight to be gained.\n\nSigned-off-by: Dave Chinner \u003cdchinner@redhat.com\u003e\nSigned-off-by: Al Viro \u003cviro@zeniv.linux.org.uk\u003e\n"
    },
    {
      "commit": "4746efded84d7c5a9c8d64d4c6e814ff0cf9fb42",
      "tree": "174f400db27c1a1d9a66407931199aabfdce6bba",
      "parents": [
        "f7b88631a89757d70192044c9d9f2e8d2fc02f2c"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Tue Jul 19 08:49:26 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Jul 19 22:09:31 2011 -0700"
      },
      "message": "vmscan: fix a livelock in kswapd\n\nI\u0027m running a workload which triggers a lot of swap in a machine with 4\nnodes.  After I kill the workload, I found a kswapd livelock.  Sometimes\nkswapd3 or kswapd2 are keeping running and I can\u0027t access filesystem,\nbut most memory is free.\n\nThis looks like a regression since commit 08951e545918c159 (\"mm: vmscan:\ncorrect check for kswapd sleeping in sleeping_prematurely\").\n\nNode 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0\nfor classzone_idx.  The reason is end_zone in balance_pgdat() is 0 by\ndefault, if all zones have watermark ok, end_zone will keep 0.\n\nLater sleeping_prematurely() always returns true.  Because this is an\norder 3 wakeup, and if classzone_idx is 0, both balanced_pages and\npresent_pages in pgdat_balanced() are 0.  We add a special case here.\nIf a zone has no page, we think it\u0027s balanced.  This fixes the livelock.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nAcked-by: Mel Gorman \u003cmgorman@suse.de\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "215ddd6664ced067afca7eebd2d1eb83f064ff5a",
      "tree": "b0e01235355d9c77b3bf63e0a57a6721fc8e3793",
      "parents": [
        "da175d06b437093f93109ba9e5efbe44dfdf9409"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Fri Jul 08 15:39:40 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Jul 08 21:14:43 2011 -0700"
      },
      "message": "mm: vmscan: only read new_classzone_idx from pgdat when reclaiming successfully\n\nDuring allocator-intensive workloads, kswapd will be woken frequently\ncausing free memory to oscillate between the high and min watermark.  This\nis expected behaviour.  Unfortunately, if the highest zone is small, a\nproblem occurs.\n\nWhen balance_pgdat() returns, it may be at a lower classzone_idx than it\nstarted because the highest zone was unreclaimable.  Before checking if it\nshould go to sleep though, it checks pgdat-\u003eclasszone_idx which when there\nis no other activity will be MAX_NR_ZONES-1.  It interprets this as it has\nbeen woken up while reclaiming, skips scheduling and reclaims again.  As\nthere is no useful reclaim work to do, it enters into a loop of shrinking\nslab consuming loads of CPU until the highest zone becomes reclaimable for\na long period of time.\n\nThere are two problems here.  1) If the returned classzone or order is\nlower, it\u0027ll continue reclaiming without scheduling.  2) if the highest\nzone was marked unreclaimable but balance_pgdat() returns immediately at\nDEF_PRIORITY, the new lower classzone is not communicated back to kswapd()\nfor sleeping.\n\nThis patch does two things that are related.  If the end_zone is\nunreclaimable, this information is communicated back.  Second, if the\nclasszone or order was reduced due to failing to reclaim, new information\nis not read from pgdat and instead an attempt is made to go to sleep.  Due\nto this, it is also necessary that pgdat-\u003eclasszone_idx be initialised\neach time to pgdat-\u003enr_zones - 1 to avoid re-reads being interpreted as\nwakeups.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReported-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Andrew Lutomirski \u003cluto@mit.edu\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "da175d06b437093f93109ba9e5efbe44dfdf9409",
      "tree": "04bbed7c41347e3614690947c66f87e8e38a6051",
      "parents": [
        "d7868dae893c83c50c7824bc2bc75f93d114669f"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Fri Jul 08 15:39:39 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Jul 08 21:14:43 2011 -0700"
      },
      "message": "mm: vmscan: evaluate the watermarks against the correct classzone\n\nWhen deciding if kswapd is sleeping prematurely, the classzone is taken\ninto account but this is different to what balance_pgdat() and the\nallocator are doing.  Specifically, the DMA zone will be checked based on\nthe classzone used when waking kswapd which could be for a GFP_KERNEL or\nGFP_HIGHMEM request.  The lowmem reserve limit kicks in, the watermark is\nnot met and kswapd thinks it\u0027s sleeping prematurely keeping kswapd awake in\nerror.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReported-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Andrew Lutomirski \u003cluto@mit.edu\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d7868dae893c83c50c7824bc2bc75f93d114669f",
      "tree": "7c9e56513ecbbf086c81ebff77310f80e0232ecc",
      "parents": [
        "08951e545918c1594434d000d88a7793e2452a9b"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Fri Jul 08 15:39:38 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Jul 08 21:14:43 2011 -0700"
      },
      "message": "mm: vmscan: do not apply pressure to slab if we are not applying pressure to zone\n\nDuring allocator-intensive workloads, kswapd will be woken frequently\ncausing free memory to oscillate between the high and min watermark.  This\nis expected behaviour.\n\nWhen kswapd applies pressure to zones during node balancing, it checks if\nthe zone is above a high+balance_gap threshold.  If it is, it does not\napply pressure but it unconditionally shrinks slab on a global basis which\nis excessive.  In the event kswapd is being kept awake due to a high small\nunreclaimable zone, it skips zone shrinking but still calls shrink_slab().\n\nOnce pressure has been applied, the check for zone being unreclaimable is\nbeing made before the check is made if all_unreclaimable should be set.\nThis miss of unreclaimable can cause has_under_min_watermark_zone to be\nset due to an unreclaimable zone preventing kswapd backing off on\ncongestion_wait().\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReported-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Andrew Lutomirski \u003cluto@mit.edu\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "08951e545918c1594434d000d88a7793e2452a9b",
      "tree": "56f91aba454751e2b6bcd67945ed2d4ebbeb2025",
      "parents": [
        "902daf6580cffe04721250fb71b5527a98718b11"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mgorman@suse.de",
        "time": "Fri Jul 08 15:39:36 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Jul 08 21:14:42 2011 -0700"
      },
      "message": "mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely\n\nDuring allocator-intensive workloads, kswapd will be woken frequently\ncausing free memory to oscillate between the high and min watermark.  This\nis expected behaviour.  Unfortunately, if the highest zone is small, a\nproblem occurs.\n\nThis seems to happen most with recent sandybridge laptops but it\u0027s\nprobably a co-incidence as some of these laptops just happen to have a\nsmall Normal zone.  The reproduction case is almost always during copying\nlarge files that kswapd pegs at 100% CPU until the file is deleted or\ncache is dropped.\n\nThe problem is mostly down to sleeping_prematurely() keeping kswapd awake\nwhen the highest zone is small and unreclaimable and compounded by the\nfact we shrink slabs even when not shrinking zones causing a lot of time\nto be spent in shrinkers and a lot of memory to be reclaimed.\n\nPatch 1 corrects sleeping_prematurely to check the zones matching\n\tthe classzone_idx instead of all zones.\n\nPatch 2 avoids shrinking slab when we are not shrinking a zone.\n\nPatch 3 notes that sleeping_prematurely is checking lower zones against\n\ta high classzone which is not what allocators or balance_pgdat()\n\tis doing leading to an artifical belief that kswapd should be\n\tstill awake.\n\nPatch 4 notes that when balance_pgdat() gives up on a high zone that the\n\tdecision is not communicated to sleeping_prematurely()\n\nThis problem affects 2.6.38.8 for certain and is expected to affect 2.6.39\nand 3.0-rc4 as well.  If accepted, they need to go to -stable to be picked\nup by distros and this series is against 3.0-rc4.  I\u0027ve cc\u0027d people that\nreported similar problems recently to see if they still suffer from the\nproblem and if this fixes it.\n\nThis patch: correct the check for kswapd sleeping in sleeping_prematurely()\n\nDuring allocator-intensive workloads, kswapd will be woken frequently\ncausing free memory to oscillate between the high and min watermark.  This\nis expected behaviour.\n\nA problem occurs if the highest zone is small.  balance_pgdat() only\nconsiders unreclaimable zones when priority is DEF_PRIORITY but\nsleeping_prematurely considers all zones.  It\u0027s possible for this sequence\nto occur\n\n  1. kswapd wakes up and enters balance_pgdat()\n  2. At DEF_PRIORITY, marks highest zone unreclaimable\n  3. At DEF_PRIORITY-1, ignores highest zone setting end_zone\n  4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from\n        highest zone, clearing all_unreclaimable. Highest zone\n        is still unbalanced\n  5. kswapd returns and calls sleeping_prematurely\n  6. sleeping_prematurely looks at *all* zones, not just the ones\n     being considered by balance_pgdat. The highest small zone\n     has all_unreclaimable cleared but the zone is not\n     balanced. all_zones_ok is false so kswapd stays awake\n\nThis patch corrects the behaviour of sleeping_prematurely to check the\nzones balance_pgdat() checked.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReported-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Pádraig Brady \u003cP@draigBrady.com\u003e\nTested-by: Andrew Lutomirski \u003cluto@mit.edu\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ac34a1a3c39da0a1b9188d12a9ce85506364ed2a",
      "tree": "f74f34047c6bc516e29196685cc8671aff4a02d2",
      "parents": [
        "26c4caea9d697043cc5a458b96411b86d7f6babd"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Mon Jun 27 16:18:12 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jun 27 18:00:13 2011 -0700"
      },
      "message": "memcg: fix direct softlimit reclaim to be called in limit path\n\nCommit d149e3b25d7c (\"memcg: add the soft_limit reclaim in global direct\nreclaim\") adds a softlimit hook to shrink_zones().  By this, soft limit\nis called as\n\n   try_to_free_pages()\n       do_try_to_free_pages()\n           shrink_zones()\n               mem_cgroup_soft_limit_reclaim()\n\nThen, direct reclaim is memcg softlimit hint aware, now.\n\nBut, the memory cgroup\u0027s \"limit\" path can call softlimit shrinker.\n\n   try_to_free_mem_cgroup_pages()\n       do_try_to_free_pages()\n           shrink_zones()\n               mem_cgroup_soft_limit_reclaim()\n\nThis will cause a global reclaim when a memcg hits limit.\n\nThis is bug. soft_limit_reclaim() should be called when\nscanning_global_lru(sc) \u003d\u003d true.\n\nAnd the commit adds a variable \"total_scanned\" for counting softlimit\nscanned pages....it\u0027s not \"total\".  This patch removes the variable and\nupdate sc-\u003enr_scanned instead of it.  This will affect shrink_slab()\u0027s\nscan condition but, global LRU is scanned by softlimit and I think this\nchange makes sense.\n\nTODO: avoid too much scanning of a zone when softlimit did enough work.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Ying Han \u003cyinghan@google.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d179e84ba5da1d0024087d1759a2938817a00f3f",
      "tree": "169c9cc4030a793df1bc29613eff85ee3acef9a9",
      "parents": [
        "7454f4ba40b419eb999a3c61a99da662bf1a2bb8"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Wed Jun 15 15:08:51 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Jun 15 20:04:02 2011 -0700"
      },
      "message": "mm: vmscan: do not use page_count without a page pin\n\nIt is unsafe to run page_count during the physical pfn scan because\ncompound_head could trip on a dangling pointer when reading\npage-\u003efirst_page if the compound page is being freed by another CPU.\n\n[mgorman@suse.de: split out patch]\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\n\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a433658c30974fc87ba3ff52d7e4e6299762aa3d",
      "tree": "8df65e22af520ca5c020281763e6874d0bb51bc5",
      "parents": [
        "e1bbd19bc4afef7adb80cca163800391c4f5773d"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Wed Jun 15 15:08:13 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Jun 15 20:03:59 2011 -0700"
      },
      "message": "vmscan,memcg: memcg aware swap token\n\nCurrently, memcg reclaim can disable swap token even if the swap token mm\ndoesn\u0027t belong in its memory cgroup.  It\u0027s slightly risky.  If an admin\ncreates very small mem-cgroup and silly guy runs contentious heavy memory\npressure workload, every tasks are going to lose swap token and then\nsystem may become unresponsive.  That\u0027s bad.\n\nThis patch adds \u0027memcg\u0027 parameter into disable_swap_token().  and if the\nparameter doesn\u0027t match swap token, VM doesn\u0027t disable it.\n\n[akpm@linux-foundation.org: coding-style fixes]\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Rik van Riel\u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1bac180bd29e03989f50054af97b53b8d37a364a",
      "tree": "6797cb73a27c1e8b7d1ea79764356dc69486dad4",
      "parents": [
        "4fd14ebf6e3b66423dfac2bc9defda7b83ee07b3"
      ],
      "author": {
        "name": "Ying Han",
        "email": "yinghan@google.com",
        "time": "Thu May 26 16:25:36 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 17:12:35 2011 -0700"
      },
      "message": "memcg: rename mem_cgroup_zone_nr_pages() to mem_cgroup_zone_nr_lru_pages()\n\nThe caller of the function has been renamed to zone_nr_lru_pages(), and\nthis is just fixing up in the memcg code.  The current name is easily to\nbe mis-read as zone\u0027s total number of pages.\n\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "246e87a9393448c20873bc5dee64be68ed559e24",
      "tree": "a17016142b267fcba2e3be9908f8138c8dcb3f3a",
      "parents": [
        "889976dbcb1218119fdd950fb7819084e37d7d37"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Thu May 26 16:25:34 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 17:12:35 2011 -0700"
      },
      "message": "memcg: fix get_scan_count() for small targets\n\nDuring memory reclaim we determine the number of pages to be scanned per\nzone as\n\n\t(anon + file) \u003e\u003e priority.\nAssume\n\tscan \u003d (anon + file) \u003e\u003e priority.\n\nIf scan \u003c SWAP_CLUSTER_MAX, the scan will be skipped for this time and\npriority gets higher.  This has some problems.\n\n  1. This increases priority as 1 without any scan.\n     To do scan in this priority, amount of pages should be larger than 512M.\n     If pages\u003e\u003epriority \u003c SWAP_CLUSTER_MAX, it\u0027s recorded and scan will be\n     batched, later. (But we lose 1 priority.)\n     If memory size is below 16M, pages \u003e\u003e priority is 0 and no scan in\n     DEF_PRIORITY forever.\n\n  2. If zone-\u003eall_unreclaimabe\u003d\u003dtrue, it\u0027s scanned only when priority\u003d\u003d0.\n     So, x86\u0027s ZONE_DMA will never be recoverred until the user of pages\n     frees memory by itself.\n\n  3. With memcg, the limit of memory can be small. When using small memcg,\n     it gets priority \u003c DEF_PRIORITY-2 very easily and need to call\n     wait_iff_congested().\n     For doing scan before priorty\u003d9, 64MB of memory should be used.\n\nThen, this patch tries to scan SWAP_CLUSTER_MAX of pages in force...when\n\n  1. the target is enough small.\n  2. it\u0027s kswapd or memcg reclaim.\n\nThen we can avoid rapid priority drop and may be able to recover\nall_unreclaimable in a small zones.  And this patch removes nr_saved_scan.\n This will allow scanning in this priority even when pages \u003e\u003e priority is\nvery small.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: Ying Han \u003cyinghan@google.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "889976dbcb1218119fdd950fb7819084e37d7d37",
      "tree": "7508706ddb6bcbe0f673aca3744f30f281b17734",
      "parents": [
        "4e4c941c108eff10844d2b441d96dab44f32f424"
      ],
      "author": {
        "name": "Ying Han",
        "email": "yinghan@google.com",
        "time": "Thu May 26 16:25:33 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 17:12:35 2011 -0700"
      },
      "message": "memcg: reclaim memory from nodes in round-robin order\n\nPresently, memory cgroup\u0027s direct reclaim frees memory from the current\nnode.  But this has some troubles.  Usually when a set of threads works in\na cooperative way, they tend to operate on the same node.  So if they hit\nlimits under memcg they will reclaim memory from themselves, damaging the\nactive working set.\n\nFor example, assume 2 node system which has Node 0 and Node 1 and a memcg\nwhich has 1G limit.  After some work, file cache remains and the usages\nare\n\n   Node 0:  1M\n   Node 1:  998M.\n\nand run an application on Node 0, it will eat its foot before freeing\nunnecessary file caches.\n\nThis patch adds round-robin for NUMA and adds equal pressure to each node.\nWhen using cpuset\u0027s spread memory feature, this will work very well.\n\nBut yes, a better algorithm is needed.\n\n[akpm@linux-foundation.org: comment editing]\n[kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons]\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d149e3b25d7c5f33de9aa866303926fa53535aa7",
      "tree": "160c8c3136246921458c96ab8257381d702208aa",
      "parents": [
        "0ae5e89c60c9eb87da36a2614836bc434b0ec2ad"
      ],
      "author": {
        "name": "Ying Han",
        "email": "yinghan@google.com",
        "time": "Thu May 26 16:25:27 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 17:12:35 2011 -0700"
      },
      "message": "memcg: add the soft_limit reclaim in global direct reclaim.\n\nWe recently added the change in global background reclaim which counts the\nreturn value of soft_limit reclaim.  Now this patch adds the similar logic\non global direct reclaim.\n\nWe should skip scanning global LRU on shrink_zone if soft_limit reclaim\ndoes enough work.  This is the first step where we start with counting the\nnr_scanned and nr_reclaimed from soft_limit reclaim into global\nscan_control.\n\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nCc: Balbir Singh \u003cbalbir@linux.vnet.ibm.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0ae5e89c60c9eb87da36a2614836bc434b0ec2ad",
      "tree": "0d509fd83ac7e7d2f52dfcbba769c43aeeb68b5f",
      "parents": [
        "f042e707ee671e4beb5389abeb9a1819a2cf5532"
      ],
      "author": {
        "name": "Ying Han",
        "email": "yinghan@google.com",
        "time": "Thu May 26 16:25:25 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu May 26 17:12:35 2011 -0700"
      },
      "message": "memcg: count the soft_limit reclaim in global background reclaim\n\nThe global kswapd scans per-zone LRU and reclaims pages regardless of the\ncgroup. It breaks memory isolation since one cgroup can end up reclaiming\npages from another cgroup. Instead we should rely on memcg-aware target\nreclaim including per-memcg kswapd and soft_limit hierarchical reclaim under\nmemory pressure.\n\nIn the global background reclaim, we do soft reclaim before scanning the\nper-zone LRU. However, the return value is ignored. This patch is the first\nstep to skip shrink_zone() if soft_limit reclaim does enough work.\n\nThis is part of the effort which tries to reduce reclaiming pages in global\nLRU in memcg. The per-memcg background reclaim patchset further enhances the\nper-cgroup targetting reclaim, which I should have V4 posted shortly.\n\nTry running multiple memory intensive workloads within seperate memcgs. Watch\nthe counters of soft_steal in memory.stat.\n\n  $ cat /dev/cgroup/A/memory.stat | grep \u0027soft\u0027\n  soft_steal 240000\n  soft_scan 240000\n  total_soft_steal 240000\n  total_soft_scan 240000\n\nThis patch:\n\nIn the global background reclaim, we do soft reclaim before scanning the\nper-zone LRU.  However, the return value is ignored.\n\nWe would like to skip shrink_zone() if soft_limit reclaim does enough\nwork.  Also, we need to make the memory pressure balanced across per-memcg\nzones, like the logic vm-core.  This patch is the first step where we\nstart with counting the nr_scanned and nr_reclaimed from soft_limit\nreclaim into the global scan_control.\n\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nAcked-by: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1495f230fa7750479c79e3656286b9183d662077",
      "tree": "e5e233bb9fe1916ccc7281e7dcc71b1572fb22c5",
      "parents": [
        "a09ed5e00084448453c8bada4dcd31e5fbfc2f21"
      ],
      "author": {
        "name": "Ying Han",
        "email": "yinghan@google.com",
        "time": "Tue May 24 17:12:27 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:26 2011 -0700"
      },
      "message": "vmscan: change shrinker API by passing shrink_control struct\n\nChange each shrinker\u0027s API by consolidating the existing parameters into\nshrink_control struct.  This will simplify any further features added w/o\ntouching each file of shrinker.\n\n[akpm@linux-foundation.org: fix build]\n[akpm@linux-foundation.org: fix warning]\n[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]\n[akpm@linux-foundation.org: fix xfs warning]\n[akpm@linux-foundation.org: update gfs2]\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Pavel Emelyanov \u003cxemul@openvz.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Steven Whitehouse \u003cswhiteho@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "a09ed5e00084448453c8bada4dcd31e5fbfc2f21",
      "tree": "493f5f2a93efb080cdcc28e793cbcfc7999e66eb",
      "parents": [
        "7b1de5868b124d8f399d8791ed30a9b679d64d4d"
      ],
      "author": {
        "name": "Ying Han",
        "email": "yinghan@google.com",
        "time": "Tue May 24 17:12:26 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:25 2011 -0700"
      },
      "message": "vmscan: change shrink_slab() interfaces by passing shrink_control\n\nConsolidate the existing parameters to shrink_slab() into a new\nshrink_control struct.  This is needed later to pass the same struct to\nshrinkers.\n\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Pavel Emelyanov \u003cxemul@openvz.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0c917313a8d84fcc0c376db3f7edb7c06f06f920",
      "tree": "de95ed4a300d1034b1abae2eb8c7e509c9dfb341",
      "parents": [
        "bd486285f24ac2fd1ff64688fb0729712c5712c4"
      ],
      "author": {
        "name": "Konstantin Khlebnikov",
        "email": "khlebnikov@openvz.org",
        "time": "Tue May 24 17:12:21 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:23 2011 -0700"
      },
      "message": "mm: strictly require elevated page refcount in isolate_lru_page()\n\nisolate_lru_page() must be called only with stable reference to the page,\nthis is what is written in the comment above it, this is reasonable.\n\ncurrent isolate_lru_page() users and its page extra reference sources:\n\n mm/huge_memory.c:\n  __collapse_huge_page_isolate()\t- reference from pte\n\n mm/memcontrol.c:\n  mem_cgroup_move_parent()\t\t- get_page_unless_zero()\n  mem_cgroup_move_charge_pte_range()\t- reference from pte\n\n mm/memory-failure.c:\n  soft_offline_page()\t\t\t- fixed, reference from get_any_page()\n  delete_from_lru_cache() - reference from caller or get_page_unless_zero()\n\t[ seems like there bug, because __memory_failure() can call\n\t  page_action() for hpages tail, but it is ok for\n\t  isolate_lru_page(), tail getted and not in lru]\n\n mm/memory_hotplug.c:\n  do_migrate_range()\t\t\t- fixed, get_page_unless_zero()\n\n mm/mempolicy.c:\n  migrate_page_add()\t\t\t- reference from pte\n\n mm/migrate.c:\n  do_move_page_to_node_array()\t\t- reference from follow_page()\n\n mlock.c:\t\t\t\t- various external references\n\n mm/vmscan.c:\n  putback_lru_page()\t\t\t- reference from isolate_lru_page()\n\nIt seems that all isolate_lru_page() users are ready now for this\nrestriction.  So, let\u0027s replace redundant get_page_unless_zero() with\nget_page() and add page initial reference count check with VM_BUG_ON()\n\nSigned-off-by: Konstantin Khlebnikov \u003ckhlebnikov@openvz.org\u003e\nCc: Andi Kleen \u003candi@firstfloor.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f06590bd718ed950c98828e30ef93204028f3210",
      "tree": "60d1c52a538618a16ebcd82a4d949446fd2036c7",
      "parents": [
        "afc7e326a3f5bafc41324d7926c324414e343ee5"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Tue May 24 17:11:11 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:01 2011 -0700"
      },
      "message": "mm: vmscan: correctly check if reclaimer should schedule during shrink_slab\n\nIt has been reported on some laptops that kswapd is consuming large\namounts of CPU and not being scheduled when SLUB is enabled during large\namounts of file copying.  It is expected that this is due to kswapd\nmissing every cond_resched() point because;\n\nshrink_page_list() calls cond_resched() if inactive pages were isolated\n        which in turn may not happen if all_unreclaimable is set in\n        shrink_zones(). If for whatver reason, all_unreclaimable is\n        set on all zones, we can miss calling cond_resched().\n\nbalance_pgdat() only calls cond_resched if the zones are not\n        balanced. For a high-order allocation that is balanced, it\n        checks order-0 again. During that window, order-0 might have\n        become unbalanced so it loops again for order-0 and returns\n        that it was reclaiming for order-0 to kswapd(). It can then\n        find that a caller has rewoken kswapd for a high-order and\n        re-enters balance_pgdat() without ever calling cond_resched().\n\nshrink_slab only calls cond_resched() if we are reclaiming slab\n\tpages. If there are a large number of direct reclaimers, the\n\tshrinker_rwsem can be contended and prevent kswapd calling\n\tcond_resched().\n\nThis patch modifies the shrink_slab() case.  If the semaphore is\ncontended, the caller will still check cond_resched().  After each\nsuccessful call into a shrinker, the check for cond_resched() remains in\ncase one shrinker is particularly slow.\n\n[mgorman@suse.de: preserve call to cond_resched after each call into shrinker]\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: James Bottomley \u003cJames.Bottomley@HansenPartnership.com\u003e\nTested-by: Colin King \u003ccolin.king@canonical.com\u003e\nCc: Raghavendra D Prabhu \u003craghu.prabhu13@gmail.com\u003e\nCc: Jan Kara \u003cjack@suse.cz\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Christoph Lameter \u003ccl@linux.com\u003e\nCc: Pekka Enberg \u003cpenberg@kernel.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.38+]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "afc7e326a3f5bafc41324d7926c324414e343ee5",
      "tree": "c7b06a424b7a35840a41dca008291ca5dcad9b42",
      "parents": [
        "a71ae47a2cbfa542c69f695809124da4e4dd9e8f"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "hannes@cmpxchg.org",
        "time": "Tue May 24 17:11:09 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 25 08:39:01 2011 -0700"
      },
      "message": "mm: vmscan: correct use of pgdat_balanced in sleeping_prematurely\n\nThere are a few reports of people experiencing hangs when copying large\namounts of data with kswapd using a large amount of CPU which appear to be\ndue to recent reclaim changes.  SLUB using high orders is the trigger but\nnot the root cause as SLUB has been using high orders for a while.  The\nroot cause was bugs introduced into reclaim which are addressed by the\nfollowing two patches.\n\nPatch 1 corrects logic introduced by commit 1741c877 (\"mm: kswapd:\n        keep kswapd awake for high-order allocations until a percentage of\n        the node is balanced\") to allow kswapd to go to sleep when\n        balanced for high orders.\n\nPatch 2 notes that it is possible for kswapd to miss every\n        cond_resched() and updates shrink_slab() so it\u0027ll at least reach\n        that scheduling point.\n\nChris Wood reports that these two patches in isolation are sufficient to\nprevent the system hanging.  AFAIK, they should also resolve similar hangs\nexperienced by James Bottomley.\n\nThis patch:\n\nJohannes Weiner poined out that the logic in commit 1741c877 (\"mm: kswapd:\nkeep kswapd awake for high-order allocations until a percentage of the\nnode is balanced\") is backwards.  Instead of allowing kswapd to go to\nsleep when balancing for high order allocations, it keeps it kswapd\nrunning uselessly.\n\nSigned-off-by: Mel Gorman \u003cmgorman@suse.de\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: James Bottomley \u003cJames.Bottomley@HansenPartnership.com\u003e\nTested-by: Colin King \u003ccolin.king@canonical.com\u003e\nCc: Raghavendra D Prabhu \u003craghu.prabhu13@gmail.com\u003e\nCc: Jan Kara \u003cjack@suse.cz\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Christoph Lameter \u003ccl@linux.com\u003e\nCc: Pekka Enberg \u003cpenberg@kernel.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.38+]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "268bb0ce3e87872cb9290c322b0d35bce230d88f",
      "tree": "c8331ade4a3e24fc589c4eb62731bc2312d35333",
      "parents": [
        "257313b2a87795e07a0bdf58d0fffbdba8b31051"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri May 20 12:50:29 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri May 20 12:50:29 2011 -0700"
      },
      "message": "sanitize \u003clinux/prefetch.h\u003e usage\n\nCommit e66eed651fd1 (\"list: remove prefetching from regular list\niterators\") removed the include of prefetch.h from list.h, which\nuncovered several cases that had apparently relied on that rather\nobscure header file dependency.\n\nSo this fixes things up a bit, using\n\n   grep -L linux/prefetch.h $(git grep -l \u0027[^a-z_]prefetchw*(\u0027 -- \u0027*.[ch]\u0027)\n   grep -L \u0027prefetchw*(\u0027 $(git grep -l \u0027linux/prefetch.h\u0027 -- \u0027*.[ch]\u0027)\n\nto guide us in finding files that either need \u003clinux/prefetch.h\u003e\ninclusion, or have it despite not needing it.\n\nThere are more of them around (mostly network drivers), but this gets\nmany core ones.\n\nReported-by: Stephen Rothwell \u003csfr@canb.auug.org.au\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d6c438b6cd733834a3cec55af8577a8fc3548016",
      "tree": "95c7719415aecf9f2a09b1e885d23ba54eb9b5b8",
      "parents": [
        "d5f33d45e4c0e306e8d16b4573891a65d9ad544f"
      ],
      "author": {
        "name": "KAMEZAWA Hiroyuki",
        "email": "kamezawa.hiroyu@jp.fujitsu.com",
        "time": "Tue May 17 15:44:10 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed May 18 02:55:23 2011 -0700"
      },
      "message": "memcg: fix zone congestion\n\nZONE_CONGESTED should be a state of global memory reclaim.  If not, a busy\nmemcg sets this and give unnecessary throttoling in wait_iff_congested()\nagainst memory recalim in other contexts.  This makes system performance\nbad.\n\nI\u0027ll think about \"memcg is congested!\" flag is required or not, later.\nBut this fix is required first.\n\nSigned-off-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Daisuke Nishimura \u003cnishimura@mxp.nes.nec.co.jp\u003e\nAcked-by: Ying Han \u003cyinghan@google.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: Johannes Weiner \u003cjweiner@redhat.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "929bea7c714220fc76ce3f75bef9056477c28e74",
      "tree": "d41b4592b658173e00c7b8bad2bce048f02e0ead",
      "parents": [
        "fe936dfc23fed3475b11067e8d9b70553eafcd9e"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Thu Apr 14 15:22:12 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Apr 14 16:06:56 2011 -0700"
      },
      "message": "vmscan: all_unreclaimable() use zone-\u003eall_unreclaimable as a name\n\nall_unreclaimable check in direct reclaim has been introduced at 2.6.19\nby following commit.\n\n\t2006 Sep 25; commit 408d8544; oom: use unreclaimable info\n\nAnd it went through strange history. firstly, following commit broke\nthe logic unintentionally.\n\n\t2008 Apr 29; commit a41f24ea; page allocator: smarter retry of\n\t\t\t\t      costly-order allocations\n\nTwo years later, I\u0027ve found obvious meaningless code fragment and\nrestored original intention by following commit.\n\n\t2010 Jun 04; commit bb21c7ce; vmscan: fix do_try_to_free_pages()\n\t\t\t\t      return value when priority\u003d\u003d0\n\nBut, the logic didn\u0027t works when 32bit highmem system goes hibernation\nand Minchan slightly changed the algorithm and fixed it .\n\n\t2010 Sep 22: commit d1908362: vmscan: check all_unreclaimable\n\t\t\t\t      in direct reclaim path\n\nBut, recently, Andrey Vagin found the new corner case. Look,\n\n\tstruct zone {\n\t  ..\n\t        int                     all_unreclaimable;\n\t  ..\n\t        unsigned long           pages_scanned;\n\t  ..\n\t}\n\nzone-\u003eall_unreclaimable and zone-\u003epages_scanned are neigher atomic\nvariables nor protected by lock.  Therefore zones can become a state of\nzone-\u003epage_scanned\u003d0 and zone-\u003eall_unreclaimable\u003d1.  In this case, current\nall_unreclaimable() return false even though zone-\u003eall_unreclaimabe\u003d1.\n\nThis resulted in the kernel hanging up when executing a loop of the form\n\n1. fork\n2. mmap\n3. touch memory\n4. read memory\n5. munmmap\n\nas described in\nhttp://www.gossamer-threads.com/lists/linux/kernel/1348725#1348725\n\nIs this ignorable minor issue?  No.  Unfortunately, x86 has very small dma\nzone and it become zone-\u003eall_unreclamble\u003d1 easily.  and if it become\nall_unreclaimable\u003d1, it never restore all_unreclaimable\u003d0.  Why?  if\nall_unreclaimable\u003d1, vmscan only try DEF_PRIORITY reclaim and\na-few-lru-pages\u003e\u003eDEF_PRIORITY always makes 0.  that mean no page scan at\nall!\n\nEventually, oom-killer never works on such systems.  That said, we can\u0027t\nuse zone-\u003epages_scanned for this purpose.  This patch restore\nall_unreclaimable() use zone-\u003eall_unreclaimable as old.  and in addition,\nto add oom_killer_disabled check to avoid reintroduce the issue of commit\nd1908362 (\"vmscan: check all_unreclaimable in direct reclaim path\").\n\nReported-by: Andrey Vagin \u003cavagin@openvz.org\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Nick Piggin \u003cnpiggin@kernel.dk\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nAcked-by: David Rientjes \u003crientjes@google.com\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "25985edcedea6396277003854657b5f3cb31a628",
      "tree": "f026e810210a2ee7290caeb737c23cb6472b7c38",
      "parents": [
        "6aba74f2791287ec407e0f92487a725a25908067"
      ],
      "author": {
        "name": "Lucas De Marchi",
        "email": "lucas.demarchi@profusion.mobi",
        "time": "Wed Mar 30 22:57:33 2011 -0300"
      },
      "committer": {
        "name": "Lucas De Marchi",
        "email": "lucas.demarchi@profusion.mobi",
        "time": "Thu Mar 31 11:26:23 2011 -0300"
      },
      "message": "Fix common misspellings\n\nFixes generated by \u0027codespell\u0027 and manually reviewed.\n\nSigned-off-by: Lucas De Marchi \u003clucas.demarchi@profusion.mobi\u003e\n"
    },
    {
      "commit": "6c5103890057b1bb781b26b7aae38d33e4c517d8",
      "tree": "e6e57961dcddcb5841acb34956e70b9dc696a880",
      "parents": [
        "3dab04e6978e358ad2307bca563fabd6c5d2c58b",
        "9d2e157d970a73b3f270b631828e03eb452d525e"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Mar 24 10:16:26 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Mar 24 10:16:26 2011 -0700"
      },
      "message": "Merge branch \u0027for-2.6.39/core\u0027 of git://git.kernel.dk/linux-2.6-block\n\n* \u0027for-2.6.39/core\u0027 of git://git.kernel.dk/linux-2.6-block: (65 commits)\n  Documentation/iostats.txt: bit-size reference etc.\n  cfq-iosched: removing unnecessary think time checking\n  cfq-iosched: Don\u0027t clear queue stats when preempt.\n  blk-throttle: Reset group slice when limits are changed\n  blk-cgroup: Only give unaccounted_time under debug\n  cfq-iosched: Don\u0027t set active queue in preempt\n  block: fix non-atomic access to genhd inflight structures\n  block: attempt to merge with existing requests on plug flush\n  block: NULL dereference on error path in __blkdev_get()\n  cfq-iosched: Don\u0027t update group weights when on service tree\n  fs: assign sb-\u003es_bdi to default_backing_dev_info if the bdi is going away\n  block: Require subsystems to explicitly allocate bio_set integrity mempool\n  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging\n  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging\n  fs: make fsync_buffers_list() plug\n  mm: make generic_writepages() use plugging\n  blk-cgroup: Add unaccounted time to timeslice_used.\n  block: fixup plugging stubs for !CONFIG_BLOCK\n  block: remove obsolete comments for blkdev_issue_zeroout.\n  blktrace: Use rq-\u003ecmd_flags directly in blk_add_trace_rq.\n  ...\n\nFix up conflicts in fs/{aio.c,super.c}\n"
    },
    {
      "commit": "8afdcece4911e51cfff2b50a269418914cab8a3f",
      "tree": "fcfb966822f0f6c128c754f3876a80106c9cc654",
      "parents": [
        "7571966189e54adf0a8bc1384d6f13f44052ba63"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Mar 22 16:33:04 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Mar 22 17:44:04 2011 -0700"
      },
      "message": "mm: vmscan: kswapd should not free an excessive number of pages when balancing small zones\n\nWhen reclaiming for order-0 pages, kswapd requires that all zones be\nbalanced.  Each cycle through balance_pgdat() does background ageing on\nall zones if necessary and applies equal pressure on the inactive zone\nunless a lot of pages are free already.\n\nA \"lot of free pages\" is defined as a \"balance gap\" above the high\nwatermark which is currently 7*high_watermark.  Historically this was\nreasonable as min_free_kbytes was small.  However, on systems using huge\npages, it is recommended that min_free_kbytes is higher and it is tuned\nwith hugeadm --set-recommended-min_free_kbytes.  With the introduction of\ntransparent huge page support, this recommended value is also applied.  On\nX86-64 with 4G of memory, min_free_kbytes becomes 67584 so one would\nexpect around 68M of memory to be free.  The Normal zone is approximately\n35000 pages so under even normal memory pressure such as copying a large\nfile, it gets exhausted quickly.  As it is getting exhausted, kswapd\napplies pressure equally to all zones, including the DMA32 zone.  DMA32 is\napproximately 700,000 pages with a high watermark of around 23,000 pages.\nIn this situation, kswapd will reclaim around (23000*8 where 8 is the high\nwatermark + balance gap of 7 * high watermark) pages or 718M of pages\nbefore the zone is ignored.  What the user sees is that free memory far\nhigher than it should be.\n\nTo avoid an excessive number of pages being reclaimed from the larger\nzones, explicitely defines the \"balance gap\" to be either 1% of the zone\nor the low watermark for the zone, whichever is smaller.  While kswapd\nwill check all zones to apply pressure, it\u0027ll ignore zones that meets the\n(high_wmark + balance_gap) watermark.\n\nTo test this, 80G were copied from a partition and the amount of memory\nbeing used was recorded.  A comparison of a patch and unpatched kernel can\nbe seen at\nhttp://www.csn.ul.ie/~mel/postings/minfree-20110222/memory-usage-hydra.ps\nand shows that kswapd is not reclaiming as much memory with the patch\napplied.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: \"Chen, Tim C\" \u003ctim.c.chen@intel.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e64a782fec684c29a8204c51b3cb554dce588592",
      "tree": "5ff0beb21b973f1ad0edc1e31b6a1c2ee4406bdc",
      "parents": [
        "702cfbf93aaf3a091b0c64c8766c1ade0a820c38"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Tue Mar 22 16:32:44 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Mar 22 17:44:02 2011 -0700"
      },
      "message": "mm: change __remove_from_page_cache()\n\nNow we renamed remove_from_page_cache with delete_from_page_cache.  As\nconsistency of __remove_from_swap_cache and remove_from_swap_cache, we\nchange internal page cache handling function name, too.\n\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nAcked-by: Hugh Dickins \u003chughd@google.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "d527caf22e48480b102c7c6ee5b9ba12170148f7",
      "tree": "7d53a2c430f8c020b6fa8390396dd2d1ce480b9a",
      "parents": [
        "89699605fe7cfd8611900346f61cb6cbf179b10a"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Tue Mar 22 16:30:38 2011 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Mar 22 17:44:00 2011 -0700"
      },
      "message": "mm: compaction: prevent kswapd compacting memory to reduce CPU usage\n\nThis patch reverts 5a03b051 (\"thp: use compaction in kswapd for GFP_ATOMIC\norder \u003e 0\") due to reports stating that kswapd CPU usage was higher and\nIRQs were being disabled more frequently.  This was reported at\nhttp://www.spinics.net/linux/fedora/alsa-user/msg09885.html.\n\nWithout this patch applied, CPU usage by kswapd hovers around the 20% mark\naccording to the tester (Arthur Marsh:\nhttp://www.spinics.net/linux/fedora/alsa-user/msg09899.html).  With this\npatch applied, it\u0027s around 2%.\n\nThe problem is not related to THP which specifies __GFP_NO_KSWAPD but is\ntriggered by high-order allocations hitting the low watermark for their\norder and waking kswapd on kernels with CONFIG_COMPACTION set.  The most\ncommon trigger for this is network cards configured for jumbo frames but\nit\u0027s also possible it\u0027ll be triggered by fork-heavy workloads (order-1)\nand some wireless cards which depend on order-1 allocations.\n\nThe symptoms for the user will be high CPU usage by kswapd in low-memory\nsituations which could be confused with another writeback problem.  While\na patch like 5a03b051 may be reintroduced in the future, this patch plays\nit safe for now and reverts it.\n\n[mel@csn.ul.ie: Beefed up the changelog]\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReported-by: Arthur Marsh \u003carthur.marsh@internode.on.net\u003e\nTested-by: Arthur Marsh \u003carthur.marsh@internode.on.net\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.38.1]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4c63f5646e405b5010cc9499419060bf2e838f5b",
      "tree": "df91ba315032c8ec4aafeb3ab96fdfa7c6c656e1",
      "parents": [
        "cafb0bfca1a73efd6d8a4a6a6a716e6134b96c24",
        "69d60eb96ae8a73cf9b79cf28051caf973006011"
      ],
      "author": {
        "name": "Jens Axboe",
        "email": "jaxboe@fusionio.com",
        "time": "Thu Mar 10 08:58:35 2011 +0100"
      },
      "committer": {
        "name": "Jens Axboe",
        "email": "jaxboe@fusionio.com",
        "time": "Thu Mar 10 08:58:35 2011 +0100"
      },
      "message": "Merge branch \u0027for-2.6.39/stack-plug\u0027 into for-2.6.39/core\n\nConflicts:\n\tblock/blk-core.c\n\tblock/blk-flush.c\n\tdrivers/md/raid1.c\n\tdrivers/md/raid10.c\n\tdrivers/md/raid5.c\n\tfs/nilfs2/btnode.c\n\tfs/nilfs2/mdt.c\n\nSigned-off-by: Jens Axboe \u003cjaxboe@fusionio.com\u003e\n"
    },
    {
      "commit": "7eaceaccab5f40bbfda044629a6298616aeaed50",
      "tree": "33954d12f63e25a47eb6d86ef3d3d0a5e62bf752",
      "parents": [
        "73c101011926c5832e6e141682180c4debe2cf45"
      ],
      "author": {
        "name": "Jens Axboe",
        "email": "jaxboe@fusionio.com",
        "time": "Thu Mar 10 08:52:07 2011 +0100"
      },
      "committer": {
        "name": "Jens Axboe",
        "email": "jaxboe@fusionio.com",
        "time": "Thu Mar 10 08:52:07 2011 +0100"
      },
      "message": "block: remove per-queue plugging\n\nCode has been converted over to the new explicit on-stack plugging,\nand delay users have been converted to use the new API for that.\nSo lets kill off the old plugging along with aops-\u003esync_page().\n\nSigned-off-by: Jens Axboe \u003cjaxboe@fusionio.com\u003e\n"
    },
    {
      "commit": "2876592f231d436c295b67726313f6f3cfb6e243",
      "tree": "e53c6db2aed6e672481c31083287d79f32ad45f4",
      "parents": [
        "ac3c8304190ed0daaa2fb01ce2a069be5e2a52a7"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Fri Feb 25 14:44:20 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Feb 25 15:07:36 2011 -0800"
      },
      "message": "mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT\n\nshould_continue_reclaim() for reclaim/compaction allows scanning to\ncontinue even if pages are not being reclaimed until the full list is\nscanned.  In terms of allocation success, this makes sense but potentially\nit introduces unwanted latency for high-order allocations such as\ntransparent hugepages and network jumbo frames that would prefer to fail\nthe allocation attempt and fallback to order-0 pages.  Worse, there is a\npotential that the full LRU scan will clear all the young bits, distort\npage aging information and potentially push pages into swap that would\nhave otherwise remained resident.\n\nThis patch will stop reclaim/compaction if no pages were reclaimed in the\nlast SWAP_CLUSTER_MAX pages that were considered.  For allocations such as\nhugetlbfs that use __GFP_REPEAT and have fewer fallback options, the full\nLRU list may still be scanned.\n\nOrder-0 allocation should not be affected because RECLAIM_MODE_COMPACTION\nis not set so the following avoids the gfp_mask being examined:\n\n        if (!(sc-\u003ereclaim_mode \u0026 RECLAIM_MODE_COMPACTION))\n                return false;\n\nA tool was developed based on ftrace that tracked the latency of\nhigh-order allocations while transparent hugepage support was enabled and\nthree benchmarks were run.  The \"fix-infinite\" figures are 2.6.38-rc4 with\nJohannes\u0027s patch \"vmscan: fix zone shrinking exit when scan work is done\"\napplied.\n\n  STREAM Highorder Allocation Latency Statistics\n                 fix-infinite     break-early\n  1 :: Count            10298           10229\n  1 :: Min             0.4560          0.4640\n  1 :: Mean            1.0589          1.0183\n  1 :: Max            14.5990         11.7510\n  1 :: Stddev          0.5208          0.4719\n  2 :: Count                2               1\n  2 :: Min             1.8610          3.7240\n  2 :: Mean            3.4325          3.7240\n  2 :: Max             5.0040          3.7240\n  2 :: Stddev          1.5715          0.0000\n  9 :: Count           111696          111694\n  9 :: Min             0.5230          0.4110\n  9 :: Mean           10.5831         10.5718\n  9 :: Max            38.4480         43.2900\n  9 :: Stddev          1.1147          1.1325\n\nMean time for order-1 allocations is reduced.  order-2 looks increased but\nwith so few allocations, it\u0027s not particularly significant.  THP mean\nallocation latency is also reduced.  That said, allocation time varies so\nsignificantly that the reductions are within noise.\n\nMax allocation time is reduced by a significant amount for low-order\nallocations but reduced for THP allocations which presumably are now\nbreaking before reclaim has done enough work.\n\n  SysBench Highorder Allocation Latency Statistics\n                 fix-infinite     break-early\n  1 :: Count            15745           15677\n  1 :: Min             0.4250          0.4550\n  1 :: Mean            1.1023          1.0810\n  1 :: Max            14.4590         10.8220\n  1 :: Stddev          0.5117          0.5100\n  2 :: Count                1               1\n  2 :: Min             3.0040          2.1530\n  2 :: Mean            3.0040          2.1530\n  2 :: Max             3.0040          2.1530\n  2 :: Stddev          0.0000          0.0000\n  9 :: Count             2017            1931\n  9 :: Min             0.4980          0.7480\n  9 :: Mean           10.4717         10.3840\n  9 :: Max            24.9460         26.2500\n  9 :: Stddev          1.1726          1.1966\n\nAgain, mean time for order-1 allocations is reduced while order-2\nallocations are too few to draw conclusions from.  The mean time for THP\nallocations is also slightly reduced albeit the reductions are within\nvarianes.\n\nOnce again, our maximum allocation time is significantly reduced for\nlow-order allocations and slightly increased for THP allocations.\n\n  Anon stream mmap reference Highorder Allocation Latency Statistics\n  1 :: Count             1376            1790\n  1 :: Min             0.4940          0.5010\n  1 :: Mean            1.0289          0.9732\n  1 :: Max             6.2670          4.2540\n  1 :: Stddev          0.4142          0.2785\n  2 :: Count                1               -\n  2 :: Min             1.9060               -\n  2 :: Mean            1.9060               -\n  2 :: Max             1.9060               -\n  2 :: Stddev          0.0000               -\n  9 :: Count            11266           11257\n  9 :: Min             0.4990          0.4940\n  9 :: Mean        27250.4669      24256.1919\n  9 :: Max      11439211.0000    6008885.0000\n  9 :: Stddev     226427.4624     186298.1430\n\nThis benchmark creates one thread per CPU which references an amount of\nanonymous memory 1.5 times the size of physical RAM.  This pounds swap\nquite heavily and is intended to exercise THP a bit.\n\nMean allocation time for order-1 is reduced as before.  It\u0027s also reduced\nfor THP allocations but the variations here are pretty massive due to\nswap.  As before, maximum allocation times are significantly reduced.\n\nOverall, the patch reduces the mean and maximum allocation latencies for\nthe smaller high-order allocations.  This was with Slab configured so it\nwould be expected to be more significant with Slub which uses these size\nallocations more aggressively.\n\nThe mean allocation times for THP allocations are also slightly reduced.\nThe maximum latency was slightly increased as predicted by the comments\ndue to reclaim/compaction breaking early.  However, workloads care more\nabout the latency of lower-order allocations than THP so it\u0027s an\nacceptable trade-off.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nAcked-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Michal Hocko \u003cmhocko@suse.cz\u003e\nCc: Kent Overstreet \u003ckent.overstreet@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f0fdc5e8e6f579310458aef43d1610a0bb5e81a4",
      "tree": "c3f9de434bf9065d2f1d7a5f4214f28efb4920c4",
      "parents": [
        "419d8c96dbfa558f00e623023917d0a5afc46129"
      ],
      "author": {
        "name": "Johannes Weiner",
        "email": "hannes@cmpxchg.org",
        "time": "Thu Feb 10 15:01:34 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Feb 11 16:12:20 2011 -0800"
      },
      "message": "vmscan: fix zone shrinking exit when scan work is done\n\nCommit 3e7d34497067 (\"mm: vmscan: reclaim order-0 and use compaction\ninstead of lumpy reclaim\") introduced an indefinite loop in\nshrink_zone().\n\nIt meant to break out of this loop when no pages had been reclaimed and\nnot a single page was even scanned.  The way it would detect the latter\nis by taking a snapshot of sc-\u003enr_scanned at the beginning of the\nfunction and comparing it against the new sc-\u003enr_scanned after the scan\nloop.  But it would re-iterate without updating that snapshot, looping\nforever if sc-\u003enr_scanned changed at least once since shrink_zone() was\ninvoked.\n\nThis is not the sole condition that would exit that loop, but it\nrequires other processes to change the zone state, as the reclaimer that\nis stuck obviously can not anymore.\n\nThis is only happening for higher-order allocations, where reclaim is\nrun back to back with compaction.\n\nSigned-off-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReported-by: Michal Hocko \u003cmhocko@suse.cz\u003e\nTested-by: Kent Overstreet\u003ckent.overstreet@gmail.com\u003e\nReported-by: Kent Overstreet \u003ckent.overstreet@gmail.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f33261d75b88f55a08e6a9648cef73509979bfba",
      "tree": "f3d8b4f41c860e9f6d054173870319a75c14c155",
      "parents": [
        "4f542e3dd90a96ee0f8fcb8173cb4104f5f753e6"
      ],
      "author": {
        "name": "David Rientjes",
        "email": "rientjes@google.com",
        "time": "Tue Jan 25 15:07:20 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Jan 26 10:50:00 2011 +1000"
      },
      "message": "mm: fix deferred congestion timeout if preferred zone is not allowed\n\nBefore 0e093d99763e (\"writeback: do not sleep on the congestion queue if\nthere are no congested BDIs or if significant congestion is not being\nencountered in the current zone\"), preferred_zone was only used for NUMA\nstatistics, to determine the zoneidx from which to allocate from given\nthe type requested, and whether to utilize memory compaction.\n\nwait_iff_congested(), though, uses preferred_zone to determine if the\ncongestion wait should be deferred because its dirty pages are backed by\na congested bdi.  This incorrectly defers the timeout and busy loops in\nthe page allocator with various cond_resched() calls if preferred_zone\nis not allowed in the current context, usually consuming 100% of a cpu.\n\nThis patch ensures preferred_zone is an allowed zone in the fastpath\ndepending on whether current is constrained by its cpuset or nodes in\nits mempolicy (when the nodemask passed is non-NULL).  This is correct\nsince the fastpath allocation always passes ALLOC_CPUSET when trying to\nallocate memory.  In the slowpath, this patch resets preferred_zone to\nthe first zone of the allowed type when the allocation is not\nconstrained by current\u0027s cpuset, i.e.  it does not pass ALLOC_CPUSET.\n\nThis patch also ensures preferred_zone is from the set of allowed nodes\nwhen called from within direct reclaim since allocations are always\nconstrained by cpusets in this context (it is blockable).\n\nBoth of these uses of cpuset_current_mems_allowed are protected by\nget_mems_allowed().\n\nSigned-off-by: David Rientjes \u003crientjes@google.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3305de51bf612603c9a4e4dc98ceb839ef933749",
      "tree": "83c1f03790d6a658d736f6a5e4067aefb604b293",
      "parents": [
        "abb65272a190660790096628859e752172d822fd"
      ],
      "author": {
        "name": "Jesper Juhl",
        "email": "jj@chaosbits.net",
        "time": "Thu Jan 20 14:44:20 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 20 17:02:05 2011 -0800"
      },
      "message": "mm/vmscan.c: remove duplicate include of compaction.h\n\nSigned-off-by: Jesper Juhl \u003cjj@chaosbits.net\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "7a608572a282a74978e10fd6cd63090aebe29f5c",
      "tree": "03e52f73d7c35ffcea8f46e14ec569da818a7631",
      "parents": [
        "9e8a462a0141b12e22c4a2f0c12e0542770401f0"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jan 17 14:42:19 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jan 17 14:42:19 2011 -0800"
      },
      "message": "Revert \"mm: batch activate_page() to reduce lock contention\"\n\nThis reverts commit 744ed1442757767ffede5008bb13e0805085902e.\n\nChris Mason ended up chasing down some page allocation errors and pages\nstuck waiting on the IO scheduler, and was able to narrow it down to two\ncommits: commit 744ed1442757 (\"mm: batch activate_page() to reduce lock\ncontention\") and d8505dee1a87 (\"mm: simplify code of swap.c\").\n\nThis reverts the first of them.\n\nReported-and-debugged-by: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nCc: Jens Axboe \u003cjaxboe@fusionio.com\u003e\nCc: linux-mm \u003clinux-mm@kvack.org\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "744ed1442757767ffede5008bb13e0805085902e",
      "tree": "75af93524570b40056f2367059dfa84ba7d90186",
      "parents": [
        "d8505dee1a87b8d41b9c4ee1325cd72258226fbc"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Thu Jan 13 15:47:34 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:50 2011 -0800"
      },
      "message": "mm: batch activate_page() to reduce lock contention\n\nThe zone-\u003elru_lock is heavily contented in workload where activate_page()\nis frequently used.  We could do batch activate_page() to reduce the lock\ncontention.  The batched pages will be added into zone list when the pool\nis full or page reclaim is trying to drain them.\n\nFor example, in a 4 socket 64 CPU system, create a sparse file and 64\nprocesses, processes shared map to the file.  Each process read access the\nwhole file and then exit.  The process exit will do unmap_vmas() and cause\na lot of activate_page() call.  In such workload, we saw about 58% total\ntime reduction with below patch.  Other workloads with a lot of\nactivate_page also benefits a lot too.\n\nI tested some microbenchmarks:\ncase-anon-cow-rand-mt\t\t0.58%\ncase-anon-cow-rand\t\t-3.30%\ncase-anon-cow-seq-mt\t\t-0.51%\ncase-anon-cow-seq\t\t-5.68%\ncase-anon-r-rand-mt\t\t0.23%\ncase-anon-r-rand\t\t0.81%\ncase-anon-r-seq-mt\t\t-0.71%\ncase-anon-r-seq\t\t\t-1.99%\ncase-anon-rx-rand-mt\t\t2.11%\ncase-anon-rx-seq-mt\t\t3.46%\ncase-anon-w-rand-mt\t\t-0.03%\ncase-anon-w-rand\t\t-0.50%\ncase-anon-w-seq-mt\t\t-1.08%\ncase-anon-w-seq\t\t\t-0.12%\ncase-anon-wx-rand-mt\t\t-5.02%\ncase-anon-wx-seq-mt\t\t-1.43%\ncase-fork\t\t\t1.65%\ncase-fork-sleep\t\t\t-0.07%\ncase-fork-withmem\t\t1.39%\ncase-hugetlb\t\t\t-0.59%\ncase-lru-file-mmap-read-mt\t-0.54%\ncase-lru-file-mmap-read\t\t0.61%\ncase-lru-file-mmap-read-rand\t-2.24%\ncase-lru-file-readonce\t\t-0.64%\ncase-lru-file-readtwice\t\t-11.69%\ncase-lru-memcg\t\t\t-1.35%\ncase-mmap-pread-rand-mt\t\t1.88%\ncase-mmap-pread-rand\t\t-15.26%\ncase-mmap-pread-seq-mt\t\t0.89%\ncase-mmap-pread-seq\t\t-69.72%\ncase-mmap-xread-rand-mt\t\t0.71%\ncase-mmap-xread-seq-mt\t\t0.38%\n\nThe most significent are:\ncase-lru-file-readtwice\t\t-11.69%\ncase-mmap-pread-rand\t\t-15.26%\ncase-mmap-pread-seq\t\t-69.72%\n\nwhich use activate_page a lot.  others are basically variations because\neach run has slightly difference.\n\n[akpm@linux-foundation.org: coding-style fixes]\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Andi Kleen \u003candi@firstfloor.org\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "9992af102974f3f8a02a1f2729c3461881539e26",
      "tree": "40958e1a8bd7efc7c9a4d28e2b77d86bb8688734",
      "parents": [
        "2c888cfbc1b45508a44763d85ba2e8ac43faff5f"
      ],
      "author": {
        "name": "Rik van Riel",
        "email": "riel@redhat.com",
        "time": "Thu Jan 13 15:47:13 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:46 2011 -0800"
      },
      "message": "thp: scale nr_rotated to balance memory pressure\n\nMake sure we scale up nr_rotated when we encounter a referenced\ntransparent huge page.  This ensures pageout scanning balance is not\ndistorted when there are huge pages on the LRU.\n\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "2c888cfbc1b45508a44763d85ba2e8ac43faff5f",
      "tree": "9a7f2214e5d6a01d5724ae63d4d50cddeb2293ff",
      "parents": [
        "97562cd243298acf573620c764a1037bd545c9bc"
      ],
      "author": {
        "name": "Rik van Riel",
        "email": "riel@redhat.com",
        "time": "Thu Jan 13 15:47:13 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:46 2011 -0800"
      },
      "message": "thp: fix anon memory statistics with transparent hugepages\n\nCount each transparent hugepage as HPAGE_PMD_NR pages in the LRU\nstatistics, so the Active(anon) and Inactive(anon) statistics in\n/proc/meminfo are correct.\n\nSigned-off-by: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "5a03b051ed87e72b959f32a86054e1142ac4cf55",
      "tree": "31f0e8efb86d48b0292f8a7ea4bd9cf7c930a0ab",
      "parents": [
        "878aee7d6b5504e01b9caffce080e792b6b8d090"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "aarcange@redhat.com",
        "time": "Thu Jan 13 15:47:11 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:46 2011 -0800"
      },
      "message": "thp: use compaction in kswapd for GFP_ATOMIC order \u003e 0\n\nThis takes advantage of memory compaction to properly generate pages of\norder \u003e 0 if regular page reclaim fails and priority level becomes more\nsevere and we don\u0027t reach the proper watermarks.\n\nSigned-off-by: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "dc83edd941f412e938841b4989be24aa288a1aa6",
      "tree": "07dbc04d544f3200b3b13be1af6c57f44ffa63c8",
      "parents": [
        "355b09c47a0cbb73b3e65a57c03f157f2e7ddb0b"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:26 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:37 2011 -0800"
      },
      "message": "mm: kswapd: use the classzone idx that kswapd was using for sleeping_prematurely()\n\nWhen kswapd is woken up for a high-order allocation, it takes account of\nthe highest usable zone by the caller (the classzone idx).  During\nallocation, this index is used to select the lowmem_reserve[] that should\nbe applied to the watermark calculation in zone_watermark_ok().\n\nWhen balancing a node, kswapd considers the highest unbalanced zone to be\nthe classzone index.  This will always be at least be the callers\nclasszone_idx and can be higher.  However, sleeping_prematurely() always\nconsiders the lowest zone (e.g.  ZONE_DMA) to be the classzone index.\nThis means that sleeping_prematurely() can consider a zone to be balanced\nthat is unusable by the allocation request that originally woke kswapd.\nThis patch changes sleeping_prematurely() to use a classzone_idx matching\nthe value it used in balance_pgdat().\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: Eric B Munson \u003cemunson@mgebm.net\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Simon Kirby \u003csim@hostway.ca\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "355b09c47a0cbb73b3e65a57c03f157f2e7ddb0b",
      "tree": "26be6f89cac5b6f5b321cf74103444ae8775c3eb",
      "parents": [
        "4d40502ea580c35414a1466d86f96484910ebaec"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:24 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:37 2011 -0800"
      },
      "message": "mm: kswapd: treat zone-\u003eall_unreclaimable in sleeping_prematurely similar to balance_pgdat()\n\nAfter DEF_PRIORITY, balance_pgdat() considers all_unreclaimable zones to\nbe balanced but sleeping_prematurely does not.  This can force kswapd to\nstay awake longer than it should.  This patch fixes it.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Eric B Munson \u003cemunson@mgebm.net\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Simon Kirby \u003csim@hostway.ca\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "4d40502ea580c35414a1466d86f96484910ebaec",
      "tree": "ed03d2b5a100be1c3371d304421af221fa893129",
      "parents": [
        "0abdee2bd4118366c62349a304f81537be69af33"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:23 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:37 2011 -0800"
      },
      "message": "mm: kswapd: reset kswapd_max_order and classzone_idx after reading\n\nWhen kswapd wakes up, it reads its order and classzone from pgdat and\ncalls balance_pgdat.  While its awake, it potentially reclaimes at a high\norder and a low classzone index.  This might have been a once-off that was\nnot required by subsequent callers.  However, because the pgdat values\nwere not reset, they remain artifically high while balance_pgdat() is\nrunning and potentially kswapd enters a second unnecessary reclaim cycle.\nReset the pgdat order and classzone index after reading.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Eric B Munson \u003cemunson@mgebm.net\u003e\nCc: Simon Kirby \u003csim@hostway.ca\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0abdee2bd4118366c62349a304f81537be69af33",
      "tree": "c013abd2dd49b3837d033eb4d32dfb57984d273e",
      "parents": [
        "1741c87757448cedd03224f01586504f9256415d"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:22 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:37 2011 -0800"
      },
      "message": "mm: kswapd: use the order that kswapd was reclaiming at for sleeping_prematurely()\n\nBefore kswapd goes to sleep, it uses sleeping_prematurely() to check if\nthere was a race pushing a zone below its watermark.  If the race\nhappened, it stays awake.  However, balance_pgdat() can decide to reclaim\nat order-0 if it decides that high-order reclaim is not working as\nexpected.  This information is not passed back to sleeping_prematurely().\nThe impact is that kswapd remains awake reclaiming pages long after it\nshould have gone to sleep.  This patch passes the adjusted order to\nsleeping_prematurely and uses the same logic as balance_pgdat to decide if\nit\u0027s ok to go to sleep.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Eric B Munson \u003cemunson@mgebm.net\u003e\nCc: Simon Kirby \u003csim@hostway.ca\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1741c87757448cedd03224f01586504f9256415d",
      "tree": "e8f3bace5f0cd1652a3a2a682189b19f7b3af875",
      "parents": [
        "9950474883e027e6e728cbcff25f7f2bf0c96530"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:21 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:37 2011 -0800"
      },
      "message": "mm: kswapd: keep kswapd awake for high-order allocations until a percentage of the node is balanced\n\nWhen reclaiming for high-orders, kswapd is responsible for balancing a\nnode but it should not reclaim excessively.  It avoids excessive reclaim\nby considering if any zone in a node is balanced then the node is\nbalanced.  In the cases where there are imbalanced zone sizes (e.g.\nZONE_DMA with both ZONE_DMA32 and ZONE_NORMAL), kswapd can go to sleep\nprematurely as just one small zone was balanced.\n\nThis alters the sleep logic of kswapd slightly.  It counts the number of\npages that make up the balanced zones.  If the total number of balanced\npages is more than a quarter of the zone, kswapd will go back to sleep.\nThis should keep a node balanced without reclaiming an excessive number of\npages.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Eric B Munson \u003cemunson@mgebm.net\u003e\nCc: Simon Kirby \u003csim@hostway.ca\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "9950474883e027e6e728cbcff25f7f2bf0c96530",
      "tree": "ecfdd3e68a25f1ef7822428c44f8375efbe9bc0c",
      "parents": [
        "c585a2678d83ba8fb02fa6b197de0ac7d67377f1"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:20 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:37 2011 -0800"
      },
      "message": "mm: kswapd: stop high-order balancing when any suitable zone is balanced\n\nSimon Kirby reported the following problem\n\n   We\u0027re seeing cases on a number of servers where cache never fully\n   grows to use all available memory.  Sometimes we see servers with 4 GB\n   of memory that never seem to have less than 1.5 GB free, even with a\n   constantly-active VM.  In some cases, these servers also swap out while\n   this happens, even though they are constantly reading the working set\n   into memory.  We have been seeing this happening for a long time; I\n   don\u0027t think it\u0027s anything recent, and it still happens on 2.6.36.\n\nAfter some debugging work by Simon, Dave Hansen and others, the prevaling\ntheory became that kswapd is reclaiming order-3 pages requested by SLUB\ntoo aggressive about it.\n\nThere are two apparent problems here.  On the target machine, there is a\nsmall Normal zone in comparison to DMA32.  As kswapd tries to balance all\nzones, it would continually try reclaiming for Normal even though DMA32\nwas balanced enough for callers.  The second problem is that\nsleeping_prematurely() does not use the same logic as balance_pgdat() when\ndeciding whether to sleep or not.  This keeps kswapd artifically awake.\n\nA number of tests were run and the figures from previous postings will\nlook very different for a few reasons.  One, the old figures were forcing\nmy network card to use GFP_ATOMIC in attempt to replicate Simon\u0027s problem.\n Second, I previous specified slub_min_order\u003d3 again in an attempt to\nreproduce Simon\u0027s problem.  In this posting, I\u0027m depending on Simon to say\nwhether his problem is fixed or not and these figures are to show the\nimpact to the ordinary cases.  Finally, the \"vmscan\" figures are taken\nfrom /proc/vmstat instead of the tracepoints.  There is less information\nbut recording is less disruptive.\n\nThe first test of relevance was postmark with a process running in the\nbackground reading a large amount of anonymous memory in blocks.  The\nobjective was to vaguely simulate what was happening on Simon\u0027s machine\nand it\u0027s memory intensive enough to have kswapd awake.\n\nPOSTMARK\n                                            traceonly          kanyzone\nTransactions per second:              156.00 ( 0.00%)   153.00 (-1.96%)\nData megabytes read per second:        21.51 ( 0.00%)    21.52 ( 0.05%)\nData megabytes written per second:     29.28 ( 0.00%)    29.11 (-0.58%)\nFiles created alone per second:       250.00 ( 0.00%)   416.00 (39.90%)\nFiles create/transact per second:      79.00 ( 0.00%)    76.00 (-3.95%)\nFiles deleted alone per second:       520.00 ( 0.00%)   420.00 (-23.81%)\nFiles delete/transact per second:      79.00 ( 0.00%)    76.00 (-3.95%)\n\nMMTests Statistics: duration\nUser/Sys Time Running Test (seconds)         16.58      17.4\nTotal Elapsed Time (seconds)                218.48    222.47\n\nVMstat Reclaim Statistics: vmscan\nDirect reclaims                                  0          4\nDirect reclaim pages scanned                     0        203\nDirect reclaim pages reclaimed                   0        184\nKswapd pages scanned                        326631     322018\nKswapd pages reclaimed                      312632     309784\nKswapd low wmark quickly                         1          4\nKswapd high wmark quickly                      122        475\nKswapd skip congestion_wait                      1          0\nPages activated                             700040     705317\nPages deactivated                           212113     203922\nPages written                                 9875       6363\n\nTotal pages scanned                         326631    322221\nTotal pages reclaimed                       312632    309968\n%age total pages scanned/reclaimed          95.71%    96.20%\n%age total pages scanned/written             3.02%     1.97%\n\nproc vmstat: Faults\nMajor Faults                                   300       254\nMinor Faults                                645183    660284\nPage ins                                    493588    486704\nPage outs                                  4960088   4986704\nSwap ins                                      1230       661\nSwap outs                                     9869      6355\n\nPerformance is mildly affected because kswapd is no longer doing as much\nwork and the background memory consumer process is getting in the way.\nNote that kswapd scanned and reclaimed fewer pages as it\u0027s less aggressive\nand overall fewer pages were scanned and reclaimed.  Swap in/out is\nparticularly reduced again reflecting kswapd throwing out fewer pages.\n\nThe slight performance impact is unfortunate here but it looks like a\ndirect result of kswapd being less aggressive.  As the bug report is about\ntoo many pages being freed by kswapd, it may have to be accepted for now.\n\nThe second test is a streaming IO benchmark that was previously used by\nJohannes to show regressions in page reclaim.\n\nMICRO\n\t\t\t\t\t traceonly  kanyzone\nUser/Sys Time Running Test (seconds)         29.29     28.87\nTotal Elapsed Time (seconds)                492.18    488.79\n\nVMstat Reclaim Statistics: vmscan\nDirect reclaims                               2128       1460\nDirect reclaim pages scanned               2284822    1496067\nDirect reclaim pages reclaimed              148919     110937\nKswapd pages scanned                      15450014   16202876\nKswapd pages reclaimed                     8503697    8537897\nKswapd low wmark quickly                      3100       3397\nKswapd high wmark quickly                     1860       7243\nKswapd skip congestion_wait                    708        801\nPages activated                               9635       9573\nPages deactivated                             1432       1271\nPages written                                  223       1130\n\nTotal pages scanned                       17734836  17698943\nTotal pages reclaimed                      8652616   8648834\n%age total pages scanned/reclaimed          48.79%    48.87%\n%age total pages scanned/written             0.00%     0.01%\n\nproc vmstat: Faults\nMajor Faults                                   165       221\nMinor Faults                               9655785   9656506\nPage ins                                      3880      7228\nPage outs                                 37692940  37480076\nSwap ins                                         0        69\nSwap outs                                       19        15\n\nAgain fewer pages are scanned and reclaimed as expected and this time the\ntest completed faster.  Note that kswapd is hitting its watermarks faster\n(low and high wmark quickly) which I expect is due to kswapd reclaiming\nfewer pages.\n\nI also ran fs-mark, iozone and sysbench but there is nothing interesting\nto report in the figures.  Performance is not significantly changed and\nthe reclaim statistics look reasonable.\n\nTgis patch:\n\nWhen the allocator enters its slow path, kswapd is woken up to balance the\nnode.  It continues working until all zones within the node are balanced.\nFor order-0 allocations, this makes perfect sense but for higher orders it\ncan have unintended side-effects.  If the zone sizes are imbalanced,\nkswapd may reclaim heavily within a smaller zone discarding an excessive\nnumber of pages.  The user-visible behaviour is that kswapd is awake and\nreclaiming even though plenty of pages are free from a suitable zone.\n\nThis patch alters the \"balance\" logic for high-order reclaim allowing\nkswapd to stop if any suitable zone becomes balanced to reduce the number\nof pages it reclaims from other zones.  kswapd still tries to ensure that\norder-0 watermarks for all zones are met before sleeping.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nReviewed-by: Eric B Munson \u003cemunson@mgebm.net\u003e\nCc: Simon Kirby \u003csim@hostway.ca\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Shaohua Li \u003cshaohua.li@intel.com\u003e\nCc: Dave Hansen \u003cdave@linux.vnet.ibm.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f3a310bc4e5ce7e55e1c8e25c31e63af017f3e50",
      "tree": "0c78777bd505f44edeb9bbcc50fb3154896574aa",
      "parents": [
        "9927af740b1b9b1e769310bd0b91425e8047b803"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:46:00 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:34 2011 -0800"
      },
      "message": "mm: vmscan: rename lumpy_mode to reclaim_mode\n\nWith compaction being used instead of lumpy reclaim, the name lumpy_mode\nand associated variables is a bit misleading.  Rename lumpy_mode to\nreclaim_mode which is a better fit.  There is no functional change.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "77f1fe6b08b13a87391549c8a820ddc817b6f50e",
      "tree": "720865bd0994da3787b6f37d33b2ee4c26a2de6c",
      "parents": [
        "3e7d344970673c5334cf7b5bb27c8c0942b06126"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:45:57 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:34 2011 -0800"
      },
      "message": "mm: migration: allow migration to operate asynchronously and avoid synchronous compaction in the faster path\n\nMigration synchronously waits for writeback if the initial passes fails.\nCallers of memory compaction do not necessarily want this behaviour if the\ncaller is latency sensitive or expects that synchronous migration is not\ngoing to have a significantly better success rate.\n\nThis patch adds a sync parameter to migrate_pages() allowing the caller to\nindicate if wait_on_page_writeback() is allowed within migration or not.\nFor reclaim/compaction, try_to_compact_pages() is first called\nasynchronously, direct reclaim runs and then try_to_compact_pages() is\ncalled synchronously as there is a greater expectation that it\u0027ll succeed.\n\n[akpm@linux-foundation.org: build/merge fix]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "3e7d344970673c5334cf7b5bb27c8c0942b06126",
      "tree": "832ecb4da5fd27efa5a503df5b96bfdee2a52ffd",
      "parents": [
        "ee64fc9354e515a79c7232cfde65c88ec627308b"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:45:56 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:33 2011 -0800"
      },
      "message": "mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim\n\nLumpy reclaim is disruptive.  It reclaims a large number of pages and\nignores the age of the pages it reclaims.  This can incur significant\nstalls and potentially increase the number of major faults.\n\nCompaction has reached the point where it is considered reasonably stable\n(meaning it has passed a lot of testing) and is a potential candidate for\ndisplacing lumpy reclaim.  This patch introduces an alternative to lumpy\nreclaim whe compaction is available called reclaim/compaction.  The basic\noperation is very simple - instead of selecting a contiguous range of\npages to reclaim, a number of order-0 pages are reclaimed and then\ncompaction is later by either kswapd (compact_zone_order()) or direct\ncompaction (__alloc_pages_direct_compact()).\n\n[akpm@linux-foundation.org: fix build]\n[akpm@linux-foundation.org: use conventional task_struct naming]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "ee64fc9354e515a79c7232cfde65c88ec627308b",
      "tree": "fb5fb6c0045ff5467ed5870d5f64806784deba2d",
      "parents": [
        "b7aba6984dc048503b69c2a885098cdd430832bf"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:45:55 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:33 2011 -0800"
      },
      "message": "mm: vmscan: convert lumpy_mode into a bitmask\n\nCurrently lumpy_mode is an enum and determines if lumpy reclaim is off,\nsyncronous or asyncronous.  In preparation for using compaction instead of\nlumpy reclaim, this patch converts the flags into a bitmap.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Andy Whitcroft \u003capw@shadowen.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "f0bc0a60b13f209df16062f94e9fb4b90dc08708",
      "tree": "c876603d2cb17a3e2ed159ca5e08e4665cc09fe2",
      "parents": [
        "c3f0da631539b3b8e17f6dda567af9958d49d14f"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Thu Jan 13 15:45:50 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:32 2011 -0800"
      },
      "message": "vmscan: factor out kswapd sleeping logic from kswapd()\n\nCurrently, kswapd() has deep nesting and is slightly hard to read.  Clean\nthis up.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "b44129b30652c8771db2265939bb8b463724043d",
      "tree": "d5b669ff4faea020b03e894706f49d5d1ae56907",
      "parents": [
        "88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:45:43 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:31 2011 -0800"
      },
      "message": "mm: vmstat: use a single setter function and callback for adjusting percpu thresholds\n\nreduce_pgdat_percpu_threshold() and restore_pgdat_percpu_threshold() exist\nto adjust the per-cpu vmstat thresholds while kswapd is awake to avoid\nerrors due to counter drift.  The functions duplicate some code so this\npatch replaces them with a single set_pgdat_percpu_threshold() that takes\na callback function to calculate the desired threshold as a parameter.\n\n[akpm@linux-foundation.org: readability tweak]\n[kosaki.motohiro@jp.fujitsu.com: set_pgdat_percpu_threshold(): don\u0027t use for_each_online_cpu]\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97",
      "tree": "6f39beef8cf918eb2ca9f64ae1bcd1ea79ca487a",
      "parents": [
        "43bb40c9e3aa51a3b038c9df2c9afb4d4685614d"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Thu Jan 13 15:45:41 2011 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Thu Jan 13 17:32:31 2011 -0800"
      },
      "message": "mm: page allocator: adjust the per-cpu counter threshold when memory is low\n\nCommit aa45484 (\"calculate a better estimate of NR_FREE_PAGES when memory\nis low\") noted that watermarks were based on the vmstat NR_FREE_PAGES.  To\navoid synchronization overhead, these counters are maintained on a per-cpu\nbasis and drained both periodically and when a threshold is above a\nthreshold.  On large CPU systems, the difference between the estimate and\nreal value of NR_FREE_PAGES can be very high.  The system can get into a\ncase where pages are allocated far below the min watermark potentially\ncausing livelock issues.  The commit solved the problem by taking a better\nreading of NR_FREE_PAGES when memory was low.\n\nUnfortately, as reported by Shaohua Li this accurate reading can consume a\nlarge amount of CPU time on systems with many sockets due to cache line\nbouncing.  This patch takes a different approach.  For large machines\nwhere counter drift might be unsafe and while kswapd is awake, the per-cpu\nthresholds for the target pgdat are reduced to limit the level of drift to\nwhat should be a safe level.  This incurs a performance penalty in heavy\nmemory pressure by a factor that depends on the workload and the machine\nbut the machine should function correctly without accidentally exhausting\nall memory on a node.  There is an additional cost when kswapd wakes and\nsleeps but the event is not expected to be frequent - in Shaohua\u0027s test\ncase, there was one recorded sleep and wake event at least.\n\nTo ensure that kswapd wakes up, a safe version of zone_watermark_ok() is\nintroduced that takes a more accurate reading of NR_FREE_PAGES when called\nfrom wakeup_kswapd, when deciding whether it is really safe to go back to\nsleep in sleeping_prematurely() and when deciding if a zone is really\nbalanced or not in balance_pgdat().  We are still using an expensive\nfunction but limiting how often it is called.\n\nWhen the test case is reproduced, the time spent in the watermark\nfunctions is reduced.  The following report is on the percentage of time\nspent cumulatively spent in the functions zone_nr_free_pages(),\nzone_watermark_ok(), __zone_watermark_ok(), zone_watermark_ok_safe(),\nzone_page_state_snapshot(), zone_page_state().\n\nvanilla                      11.6615%\ndisable-threshold            0.2584%\n\nDavid said:\n\n: We had to pull aa454840 \"mm: page allocator: calculate a better estimate\n: of NR_FREE_PAGES when memory is low and kswapd is awake\" from 2.6.36\n: internally because tests showed that it would cause the machine to stall\n: as the result of heavy kswapd activity.  I merged it back with this fix as\n: it is pending in the -mm tree and it solves the issue we were seeing, so I\n: definitely think this should be pushed to -stable (and I would seriously\n: consider it for 2.6.37 inclusion even at this late date).\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReported-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nReviewed-by: Christoph Lameter \u003ccl@linux.com\u003e\nTested-by: Nicolas Bareil \u003cnico@chdir.org\u003e\nCc: David Rientjes \u003crientjes@google.com\u003e\nCc: Kyle McMartin \u003ckyle@mcmartin.ca\u003e\nCc: \u003cstable@kernel.org\u003e\t\t[2.6.37.1, 2.6.36.x]\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "6072d13c429373c5d63b69dadbbef40a9b035552",
      "tree": "a2bf745efaa4092f2a8d7d9a9b160c2a7a3b303f",
      "parents": [
        "0aded708d125a3ff7e5abaea9c2d9c6d7ebbfdcd"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Dec 01 13:35:19 2010 -0500"
      },
      "committer": {
        "name": "Trond Myklebust",
        "email": "Trond.Myklebust@netapp.com",
        "time": "Thu Dec 02 09:55:21 2010 -0500"
      },
      "message": "Call the filesystem back whenever a page is removed from the page cache\n\nNFS needs to be able to release objects that are stored in the page\ncache once the page itself is no longer visible from the page cache.\n\nThis patch adds a callback to the address space operations that allows\nfilesystems to perform page cleanups once the page has been removed\nfrom the page cache.\n\nOriginal patch by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n[trondmy: cover the cases of invalidate_inode_pages2() and\n          truncate_inode_pages()]\nSigned-off-by: Trond Myklebust \u003cTrond.Myklebust@netapp.com\u003e\n"
    },
    {
      "commit": "1dce071e18b7264457d17c0dec4c7e430bfaee7d",
      "tree": "ced52f7f8e4177f9ea37f891f4d33d0a5109e651",
      "parents": [
        "38715258aa2e8cd94bd4aafadc544e5104efd551"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Thu Nov 11 14:05:17 2010 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Fri Nov 12 07:55:31 2010 -0800"
      },
      "message": "vmscan: avoid setting zone congested if no page dirty\n\nnr_dirty and nr_congested are increased only when the page is dirty.  So\nif all pages are clean, both them will be zero.  In this case, we should\nnot mark the zone congested.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nReviewed-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nAcked-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "2e30244a7cc1ff09013a1238d415b4076406388e",
      "tree": "f555a78df877bbbd300ba3ebce6e31b5609e965f",
      "parents": [
        "4cbec4c8b9fda9ec784086fe7f74cd32a8adda95"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Oct 26 14:21:46 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:08 2010 -0700"
      },
      "message": "vmscan,tmpfs: treat used once pages on tmpfs as used once\n\nWhen a page has PG_referenced, shrink_page_list() discards it only if it\nis not dirty.  This rule works fine if the backing filesystem is a regular\none.  PG_dirty is a good signal that the page was used recently because\nthe flusher threads clean pages periodically.  In addition, page writeback\nis costlier than simple page discard.\n\nHowever, when a page is on tmpfs this heuristic doesn\u0027t work because\nflusher threads don\u0027t write back tmpfs pages.  Consequently tmpfs pages\nalways rotate around the lru twice at least and adds unnecessary lru\nchurn.  Simple tmpfs streaming io shouldn\u0027t cause large anonymous page\nswap-out.\n\nRemove this unncessary reclaim bonus of tmpfs pages.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Hugh Dickins \u003chughd@google.com\u003e\nReviewed-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0e093d99763eb4cea09f8ca4f1d01f34e121d10b",
      "tree": "fad38f9c3651c81db298521141a79d9468f71986",
      "parents": [
        "08fc468f4eaf6683bae5bdb94743a09d8630cb80"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 26 14:21:45 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone\n\nIf congestion_wait() is called with no BDI congested, the caller will\nsleep for the full timeout and this may be an unnecessary sleep.  This\npatch adds a wait_iff_congested() that checks congestion and only sleeps\nif a BDI is congested else, it calls cond_resched() to ensure the caller\nis not hogging the CPU longer than its quota but otherwise will not sleep.\n\nThis is aimed at reducing some of the major desktop stalls reported during\nIO.  For example, while kswapd is operating, it calls congestion_wait()\nbut it could just have been reclaiming clean page cache pages with no\ncongestion.  Without this patch, it would sleep for a full timeout but\nafter this patch, it\u0027ll just call schedule() if it has been on the CPU too\nlong.  Similar logic applies to direct reclaimers that are not making\nenough progress.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Jens Axboe \u003caxboe@kernel.dk\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "08fc468f4eaf6683bae5bdb94743a09d8630cb80",
      "tree": "a2225421eb8e01a8e9df588f5064be81059af91a",
      "parents": [
        "47185052165a4c5de0a461018238375dd982c2ec"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Oct 26 14:21:44 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "vmscan: isolate_lru_pages(): stop neighbour search if neighbour cannot be isolated\n\nisolate_lru_pages() does not just isolate LRU tail pages, but also\nisolates neighbour pages of the eviction page.  The neighbour search does\nnot stop even if neighbours cannot be isolated which is excessive as the\nlumpy reclaim will no longer result in a successful higher order\nallocation.  This patch stops the PFN neighbour pages if an isolation\nfails and moves on to the next block.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "47185052165a4c5de0a461018238375dd982c2ec",
      "tree": "4425005eb24efdd66d0d9d4f95a11fdad04843b1",
      "parents": [
        "7d3579e8e61937cbba268ea9b218d006b6d64221"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Oct 26 14:21:43 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "vmscan: remove dead code in shrink_inactive_list()\n\nAfter synchrounous lumpy reclaim, the page_list is guaranteed to not have\nactive pages as page activation in shrink_page_list() disables lumpy\nreclaim.  Remove the dead code.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "7d3579e8e61937cbba268ea9b218d006b6d64221",
      "tree": "4fa1863641343eee551681d60a823a84a2611289",
      "parents": [
        "bc57e00f5e0b2480ef222c775c49552d3a930db7"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Oct 26 14:21:42 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "vmscan: narrow the scenarios in whcih lumpy reclaim uses synchrounous reclaim\n\nshrink_page_list() can decide to give up reclaiming a page under a\nnumber of conditions such as\n\n  1. trylock_page() failure\n  2. page is unevictable\n  3. zone reclaim and page is mapped\n  4. PageWriteback() is true\n  5. page is swapbacked and swap is full\n  6. add_to_swap() failure\n  7. page is dirty and gfpmask don\u0027t have GFP_IO, GFP_FS\n  8. page is pinned\n  9. IO queue is congested\n 10. pageout() start IO, but not finished\n\nWith lumpy reclaim, failures result in entering synchronous lumpy reclaim\nbut this can be unnecessary.  In cases (2), (3), (5), (6), (7) and (8),\nthere is no point retrying.  This patch causes lumpy reclaim to abort when\nit is known it will fail.\n\nCase (9) is more interesting. current behavior is,\n  1. start shrink_page_list(async)\n  2. found queue_congested()\n  3. skip pageout write\n  4. still start shrink_page_list(sync)\n  5. wait on a lot of pages\n  6. again, found queue_congested()\n  7. give up pageout write again\n\nSo, it\u0027s useless time wasting.  However, just skipping page reclaim is\nalso notgood as x86 allocating a huge page needs 512 pages for example.\nIt can have more dirty pages than queue congestion threshold (~\u003d128).\n\nAfter this patch, pageout() behaves as follows;\n\n - If order \u003e PAGE_ALLOC_COSTLY_ORDER\n\tIgnore queue congestion always.\n - If order \u003c\u003d PAGE_ALLOC_COSTLY_ORDER\n\tskip write page and disable lumpy reclaim.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "bc57e00f5e0b2480ef222c775c49552d3a930db7",
      "tree": "51aa33378602a41fb73b9b2fbee2ca04706aa9d6",
      "parents": [
        "52bb9198668968506f9d12bf35d7f5d3f094921e"
      ],
      "author": {
        "name": "KOSAKI Motohiro",
        "email": "kosaki.motohiro@jp.fujitsu.com",
        "time": "Tue Oct 26 14:21:41 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "vmscan: synchronous lumpy reclaim should not call congestion_wait()\n\ncongestion_wait() means \"wait until queue congestion is cleared\".\nHowever, synchronous lumpy reclaim does not need this congestion_wait() as\nshrink_page_list(PAGEOUT_IO_SYNC) uses wait_on_page_writeback() and it\nprovides the necessary waiting.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nReviewed-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nReviewed-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nReviewed-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e11da5b4fdf01d71d73c21cb92b00595b917d7fd",
      "tree": "30da286bac7533fba5c119396491ab05a92471fd",
      "parents": [
        "66d9a986cddbbc2ea5db013e7999c621a956cc47"
      ],
      "author": {
        "name": "Mel Gorman",
        "email": "mel@csn.ul.ie",
        "time": "Tue Oct 26 14:21:40 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "tracing, vmscan: add trace events for LRU list shrinking\n\nThere have been numerous reports of stalls that pointed at the problem\nbeing somewhere in the VM.  There are multiple roots to the problems which\nmeans dealing with any of the root problems in isolation is tricky to\njustify on their own and they would still need integration testing.  This\npatch series puts together two different patch sets which in combination\nshould tackle some of the root causes of latency problems being reported.\n\nPatch 1 adds a tracepoint for shrink_inactive_list.  For this series, the\nmost important results is being able to calculate the scanning/reclaim\nratio as a measure of the amount of work being done by page reclaim.\n\nPatch 2 accounts for time spent in congestion_wait.\n\nPatches 3-6 were originally developed by Kosaki Motohiro but reworked for\nthis series.  It has been noted that lumpy reclaim is far too aggressive\nand trashes the system somewhat.  As SLUB uses high-order allocations, a\nlarge cost incurred by lumpy reclaim will be noticeable.  It was also\nreported during transparent hugepage support testing that lumpy reclaim\nwas trashing the system and these patches should mitigate that problem\nwithout disabling lumpy reclaim.\n\nPatch 7 adds wait_iff_congested() and replaces some callers of\ncongestion_wait().  wait_iff_congested() only sleeps if there is a BDI\nthat is currently congested.  Patch 8 notes that any BDI being congested\nis not necessarily a problem because there could be multiple BDIs of\nvarying speeds and numberous zones.  It attempts to track when a zone\nbeing reclaimed contains many pages backed by a congested BDI and if so,\nreclaimers wait on the congestion queue.\n\nI ran a number of tests with monitoring on X86, X86-64 and PPC64. Each\nmachine had 3G of RAM and the CPUs were\n\nX86:    Intel P4 2-core\nX86-64: AMD Phenom 4-core\nPPC64:  PPC970MP\n\nEach used a single disk and the onboard IO controller.  Dirty ratio was\nleft at 20.  I\u0027m just going to report for X86-64 and PPC64 in a vague\nattempt to keep this report short.  Four kernels were tested each based on\nv2.6.36-rc4\n\ntraceonly-v2r2:     Patches 1 and 2 to instrument vmscan reclaims and congestion_wait\nlowlumpy-v2r3:      Patches 1-6 to test if lumpy reclaim is better\nwaitcongest-v2r3:   Patches 1-7 to only wait on congestion\nwaitwriteback-v2r4: Patches 1-8 to detect when a zone is congested\n\nnocongest-v1r5: Patches 1-3 for testing wait_iff_congestion\nnodirect-v1r5:  Patches 1-10 to disable filesystem writeback for better IO\n\nThe tests run were as follows\n\nkernbench\n\tcompile-based benchmark. Smoke test performance\n\nsysbench\n\tOLTP read-only benchmark. Will be re-run in the future as read-write\n\nmicro-mapped-file-stream\n\tThis is a micro-benchmark from Johannes Weiner that accesses a\n\tlarge sparse-file through mmap(). It was configured to run in only\n\tsingle-CPU mode but can be indicative of how well page reclaim\n\tidentifies suitable pages.\n\nstress-highalloc\n\tTries to allocate huge pages under heavy load.\n\nkernbench, iozone and sysbench did not report any performance regression\non any machine.  sysbench did pressure the system lightly and there was\nreclaim activity but there were no difference of major interest between\nthe kernels.\n\nX86-64 micro-mapped-file-stream\n\n                                      traceonly-v2r2           lowlumpy-v2r3        waitcongest-v2r3     waitwriteback-v2r4\npgalloc_dma                       1639.00 (   0.00%)       667.00 (-145.73%)      1167.00 ( -40.45%)       578.00 (-183.56%)\npgalloc_dma32                  2842410.00 (   0.00%)   2842626.00 (   0.01%)   2843043.00 (   0.02%)   2843014.00 (   0.02%)\npgalloc_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npgsteal_dma                        729.00 (   0.00%)        85.00 (-757.65%)       609.00 ( -19.70%)       125.00 (-483.20%)\npgsteal_dma32                  2338721.00 (   0.00%)   2447354.00 (   4.44%)   2429536.00 (   3.74%)   2436772.00 (   4.02%)\npgsteal_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npgscan_kswapd_dma                 1469.00 (   0.00%)       532.00 (-176.13%)      1078.00 ( -36.27%)       220.00 (-567.73%)\npgscan_kswapd_dma32            4597713.00 (   0.00%)   4503597.00 (  -2.09%)   4295673.00 (  -7.03%)   3891686.00 ( -18.14%)\npgscan_kswapd_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npgscan_direct_dma                   71.00 (   0.00%)       134.00 (  47.01%)       243.00 (  70.78%)       352.00 (  79.83%)\npgscan_direct_dma32             305820.00 (   0.00%)    280204.00 (  -9.14%)    600518.00 (  49.07%)    957485.00 (  68.06%)\npgscan_direct_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npageoutrun                       16296.00 (   0.00%)     21254.00 (  23.33%)     18447.00 (  11.66%)     20067.00 (  18.79%)\nallocstall                         443.00 (   0.00%)       273.00 ( -62.27%)       513.00 (  13.65%)      1568.00 (  71.75%)\n\nThese are based on the raw figures taken from /proc/vmstat.  It\u0027s a rough\nmeasure of reclaim activity.  Note that allocstall counts are higher\nbecause we are entering direct reclaim more often as a result of not\nsleeping in congestion.  In itself, it\u0027s not necessarily a bad thing.\nIt\u0027s easier to get a view of what happened from the vmscan tracepoint\nreport.\n\nFTrace Reclaim Statistics: vmscan\n\n                                traceonly-v2r2   lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4\nDirect reclaims                                443        273        513       1568\nDirect reclaim pages scanned                305968     280402     600825     957933\nDirect reclaim pages reclaimed               43503      19005      30327     117191\nDirect reclaim write file async I/O              0          0          0          0\nDirect reclaim write anon async I/O              0          3          4         12\nDirect reclaim write file sync I/O               0          0          0          0\nDirect reclaim write anon sync I/O               0          0          0          0\nWake kswapd requests                        187649     132338     191695     267701\nKswapd wakeups                                   3          1          4          1\nKswapd pages scanned                       4599269    4454162    4296815    3891906\nKswapd pages reclaimed                     2295947    2428434    2399818    2319706\nKswapd reclaim write file async I/O              1          0          1          1\nKswapd reclaim write anon async I/O             59        187         41        222\nKswapd reclaim write file sync I/O               0          0          0          0\nKswapd reclaim write anon sync I/O               0          0          0          0\nTime stalled direct reclaim (seconds)         4.34       2.52       6.63       2.96\nTime kswapd awake (seconds)                  11.15      10.25      11.01      10.19\n\nTotal pages scanned                        4905237   4734564   4897640   4849839\nTotal pages reclaimed                      2339450   2447439   2430145   2436897\n%age total pages scanned/reclaimed          47.69%    51.69%    49.62%    50.25%\n%age total pages scanned/written             0.00%     0.00%     0.00%     0.00%\n%age  file pages scanned/written             0.00%     0.00%     0.00%     0.00%\nPercentage Time Spent Direct Reclaim        29.23%    19.02%    38.48%    20.25%\nPercentage Time kswapd Awake                78.58%    78.85%    76.83%    79.86%\n\nWhat is interesting here for nocongest in particular is that while direct\nreclaim scans more pages, the overall number of pages scanned remains the\nsame and the ratio of pages scanned to pages reclaimed is more or less the\nsame.  In other words, while we are sleeping less, reclaim is not doing\nmore work and as direct reclaim and kswapd is awake for less time, it\nwould appear to be doing less work.\n\nFTrace Reclaim Statistics: congestion_wait\nDirect number congest     waited                87        196         64          0\nDirect time   congest     waited            4604ms     4732ms     5420ms        0ms\nDirect full   congest     waited                72        145         53          0\nDirect number conditional waited                 0          0        324       1315\nDirect time   conditional waited               0ms        0ms        0ms        0ms\nDirect full   conditional waited                 0          0          0          0\nKSwapd number congest     waited                20         10         15          7\nKSwapd time   congest     waited            1264ms      536ms      884ms      284ms\nKSwapd full   congest     waited                10          4          6          2\nKSwapd number conditional waited                 0          0          0          0\nKSwapd time   conditional waited               0ms        0ms        0ms        0ms\nKSwapd full   conditional waited                 0          0          0          0\n\nThe vanilla kernel spent 8 seconds asleep in direct reclaim and no time at\nall asleep with the patches.\n\nMMTests Statistics: duration\nUser/Sys Time Running Test (seconds)         10.51     10.73      10.6     11.66\nTotal Elapsed Time (seconds)                 14.19     13.00     14.33     12.76\n\nOverall, the tests completed faster. It is interesting to note that backing off further\nwhen a zone is congested and not just a BDI was more efficient overall.\n\nPPC64 micro-mapped-file-stream\npgalloc_dma                    3024660.00 (   0.00%)   3027185.00 (   0.08%)   3025845.00 (   0.04%)   3026281.00 (   0.05%)\npgalloc_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npgsteal_dma                    2508073.00 (   0.00%)   2565351.00 (   2.23%)   2463577.00 (  -1.81%)   2532263.00 (   0.96%)\npgsteal_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npgscan_kswapd_dma              4601307.00 (   0.00%)   4128076.00 ( -11.46%)   3912317.00 ( -17.61%)   3377165.00 ( -36.25%)\npgscan_kswapd_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npgscan_direct_dma               629825.00 (   0.00%)    971622.00 (  35.18%)   1063938.00 (  40.80%)   1711935.00 (  63.21%)\npgscan_direct_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)\npageoutrun                       27776.00 (   0.00%)     20458.00 ( -35.77%)     18763.00 ( -48.04%)     18157.00 ( -52.98%)\nallocstall                         977.00 (   0.00%)      2751.00 (  64.49%)      2098.00 (  53.43%)      5136.00 (  80.98%)\n\nSimilar trends to x86-64. allocstalls are up but it\u0027s not necessarily bad.\n\nFTrace Reclaim Statistics: vmscan\nDirect reclaims                                977       2709       2098       5136\nDirect reclaim pages scanned                629825     963814    1063938    1711935\nDirect reclaim pages reclaimed               75550     242538     150904     387647\nDirect reclaim write file async I/O              0          0          0          2\nDirect reclaim write anon async I/O              0         10          0          4\nDirect reclaim write file sync I/O               0          0          0          0\nDirect reclaim write anon sync I/O               0          0          0          0\nWake kswapd requests                        392119    1201712     571935     571921\nKswapd wakeups                                   3          2          3          3\nKswapd pages scanned                       4601307    4128076    3912317    3377165\nKswapd pages reclaimed                     2432523    2318797    2312673    2144616\nKswapd reclaim write file async I/O             20          1          1          1\nKswapd reclaim write anon async I/O             57        132         11        121\nKswapd reclaim write file sync I/O               0          0          0          0\nKswapd reclaim write anon sync I/O               0          0          0          0\nTime stalled direct reclaim (seconds)         6.19       7.30      13.04      10.88\nTime kswapd awake (seconds)                  21.73      26.51      25.55      23.90\n\nTotal pages scanned                        5231132   5091890   4976255   5089100\nTotal pages reclaimed                      2508073   2561335   2463577   2532263\n%age total pages scanned/reclaimed          47.95%    50.30%    49.51%    49.76%\n%age total pages scanned/written             0.00%     0.00%     0.00%     0.00%\n%age  file pages scanned/written             0.00%     0.00%     0.00%     0.00%\nPercentage Time Spent Direct Reclaim        18.89%    20.65%    32.65%    27.65%\nPercentage Time kswapd Awake                72.39%    80.68%    78.21%    77.40%\n\nAgain, a similar trend that the congestion_wait changes mean that direct\nreclaim scans more pages but the overall number of pages scanned while\nslightly reduced, are very similar.  The ratio of scanning/reclaimed\nremains roughly similar.  The downside is that kswapd and direct reclaim\nwas awake longer and for a larger percentage of the overall workload.\nIt\u0027s possible there were big differences in the amount of time spent\nreclaiming slab pages between the different kernels which is plausible\nconsidering that the micro tests runs after fsmark and sysbench.\n\nTrace Reclaim Statistics: congestion_wait\nDirect number congest     waited               845       1312        104          0\nDirect time   congest     waited           19416ms    26560ms     7544ms        0ms\nDirect full   congest     waited               745       1105         72          0\nDirect number conditional waited                 0          0       1322       2935\nDirect time   conditional waited               0ms        0ms       12ms      312ms\nDirect full   conditional waited                 0          0          0          3\nKSwapd number congest     waited                39        102         75         63\nKSwapd time   congest     waited            2484ms     6760ms     5756ms     3716ms\nKSwapd full   congest     waited                20         48         46         25\nKSwapd number conditional waited                 0          0          0          0\nKSwapd time   conditional waited               0ms        0ms        0ms        0ms\nKSwapd full   conditional waited                 0          0          0          0\n\nThe vanilla kernel spent 20 seconds asleep in direct reclaim and only\n312ms asleep with the patches.  The time kswapd spent congest waited was\nalso reduced by a large factor.\n\nMMTests Statistics: duration\nser/Sys Time Running Test (seconds)         26.58     28.05      26.9     28.47\nTotal Elapsed Time (seconds)                 30.02     32.86     32.67     30.88\n\nWith all patches applies, the completion times are very similar.\n\nX86-64 STRESS-HIGHALLOC\n                traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4\nPass 1          82.00 ( 0.00%)    84.00 ( 2.00%)    85.00 ( 3.00%)    85.00 ( 3.00%)\nPass 2          90.00 ( 0.00%)    87.00 (-3.00%)    88.00 (-2.00%)    89.00 (-1.00%)\nAt Rest         92.00 ( 0.00%)    90.00 (-2.00%)    90.00 (-2.00%)    91.00 (-1.00%)\n\nSuccess figures across the board are broadly similar.\n\n                traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4\nDirect reclaims                               1045        944        886        887\nDirect reclaim pages scanned                135091     119604     109382     101019\nDirect reclaim pages reclaimed               88599      47535      47863      46671\nDirect reclaim write file async I/O            494        283        465        280\nDirect reclaim write anon async I/O          29357      13710      16656      13462\nDirect reclaim write file sync I/O             154          2          2          3\nDirect reclaim write anon sync I/O           14594        571        509        561\nWake kswapd requests                          7491        933        872        892\nKswapd wakeups                                 814        778        731        780\nKswapd pages scanned                       7290822   15341158   11916436   13703442\nKswapd pages reclaimed                     3587336    3142496    3094392    3187151\nKswapd reclaim write file async I/O          91975      32317      28022      29628\nKswapd reclaim write anon async I/O        1992022     789307     829745     849769\nKswapd reclaim write file sync I/O               0          0          0          0\nKswapd reclaim write anon sync I/O               0          0          0          0\nTime stalled direct reclaim (seconds)      4588.93    2467.16    2495.41    2547.07\nTime kswapd awake (seconds)                2497.66    1020.16    1098.06    1176.82\n\nTotal pages scanned                        7425913  15460762  12025818  13804461\nTotal pages reclaimed                      3675935   3190031   3142255   3233822\n%age total pages scanned/reclaimed          49.50%    20.63%    26.13%    23.43%\n%age total pages scanned/written            28.66%     5.41%     7.28%     6.47%\n%age  file pages scanned/written             1.25%     0.21%     0.24%     0.22%\nPercentage Time Spent Direct Reclaim        57.33%    42.15%    42.41%    42.99%\nPercentage Time kswapd Awake                43.56%    27.87%    29.76%    31.25%\n\nScanned/reclaimed ratios again look good with big improvements in\nefficiency.  The Scanned/written ratios also look much improved.  With a\nbetter scanned/written ration, there is an expectation that IO would be\nmore efficient and indeed, the time spent in direct reclaim is much\nreduced by the full series and kswapd spends a little less time awake.\n\nOverall, indications here are that allocations were happening much faster\nand this can be seen with a graph of the latency figures as the\nallocations were taking place\nhttp://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-hydra-mean.ps\n\nFTrace Reclaim Statistics: congestion_wait\nDirect number congest     waited              1333        204        169          4\nDirect time   congest     waited           78896ms     8288ms     7260ms      200ms\nDirect full   congest     waited               756         92         69          2\nDirect number conditional waited                 0          0         26        186\nDirect time   conditional waited               0ms        0ms        0ms     2504ms\nDirect full   conditional waited                 0          0          0         25\nKSwapd number congest     waited                 4        395        227        282\nKSwapd time   congest     waited             384ms    25136ms    10508ms    18380ms\nKSwapd full   congest     waited                 3        232         98        176\nKSwapd number conditional waited                 0          0          0          0\nKSwapd time   conditional waited               0ms        0ms        0ms        0ms\nKSwapd full   conditional waited                 0          0          0          0\nKSwapd full   conditional waited               318          0        312          9\n\nOverall, the time spent speeping is reduced.  kswapd is still hitting\ncongestion_wait() but that is because there are callers remaining where it\nwasn\u0027t clear in advance if they should be changed to wait_iff_congested()\nor not.  Overall the sleep imes are reduced though - from 79ish seconds to\nabout 19.\n\nMMTests Statistics: duration\nUser/Sys Time Running Test (seconds)       3415.43   3386.65   3388.39    3377.5\nTotal Elapsed Time (seconds)               5733.48   3660.33   3689.41   3765.39\n\nWith the full series, the time to complete the tests are reduced by 30%\n\nPPC64 STRESS-HIGHALLOC\n                traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4\nPass 1          17.00 ( 0.00%)    34.00 (17.00%)    38.00 (21.00%)    43.00 (26.00%)\nPass 2          25.00 ( 0.00%)    37.00 (12.00%)    42.00 (17.00%)    46.00 (21.00%)\nAt Rest         49.00 ( 0.00%)    43.00 (-6.00%)    45.00 (-4.00%)    51.00 ( 2.00%)\n\nSuccess rates there are *way* up particularly considering that the 16MB\nhuge pages on PPC64 mean that it\u0027s always much harder to allocate them.\n\nFTrace Reclaim Statistics: vmscan\n              stress-highalloc  stress-highalloc  stress-highalloc  stress-highalloc\n                traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4\nDirect reclaims                                499        505        564        509\nDirect reclaim pages scanned                223478      41898      51818      45605\nDirect reclaim pages reclaimed              137730      21148      27161      23455\nDirect reclaim write file async I/O            399        136        162        136\nDirect reclaim write anon async I/O          46977       2865       4686       3998\nDirect reclaim write file sync I/O              29          0          1          3\nDirect reclaim write anon sync I/O           31023        159        237        239\nWake kswapd requests                           420        351        360        326\nKswapd wakeups                                 185        294        249        277\nKswapd pages scanned                      15703488   16392500   17821724   17598737\nKswapd pages reclaimed                     5808466    2908858    3139386    3145435\nKswapd reclaim write file async I/O         159938      18400      18717      13473\nKswapd reclaim write anon async I/O        3467554     228957     322799     234278\nKswapd reclaim write file sync I/O               0          0          0          0\nKswapd reclaim write anon sync I/O               0          0          0          0\nTime stalled direct reclaim (seconds)      9665.35    1707.81    2374.32    1871.23\nTime kswapd awake (seconds)                9401.21    1367.86    1951.75    1328.88\n\nTotal pages scanned                       15926966  16434398  17873542  17644342\nTotal pages reclaimed                      5946196   2930006   3166547   3168890\n%age total pages scanned/reclaimed          37.33%    17.83%    17.72%    17.96%\n%age total pages scanned/written            23.27%     1.52%     1.94%     1.43%\n%age  file pages scanned/written             1.01%     0.11%     0.11%     0.08%\nPercentage Time Spent Direct Reclaim        44.55%    35.10%    41.42%    36.91%\nPercentage Time kswapd Awake                86.71%    43.58%    52.67%    41.14%\n\nWhile the scanning rates are slightly up, the scanned/reclaimed and\nscanned/written figures are much improved.  The time spent in direct\nreclaim and with kswapd are massively reduced, mostly by the lowlumpy\npatches.\n\nFTrace Reclaim Statistics: congestion_wait\nDirect number congest     waited               725        303        126          3\nDirect time   congest     waited           45524ms     9180ms     5936ms      300ms\nDirect full   congest     waited               487        190         52          3\nDirect number conditional waited                 0          0        200        301\nDirect time   conditional waited               0ms        0ms        0ms     1904ms\nDirect full   conditional waited                 0          0          0         19\nKSwapd number congest     waited                 0          2         23          4\nKSwapd time   congest     waited               0ms      200ms      420ms      404ms\nKSwapd full   congest     waited                 0          2          2          4\nKSwapd number conditional waited                 0          0          0          0\nKSwapd time   conditional waited               0ms        0ms        0ms        0ms\nKSwapd full   conditional waited                 0          0          0          0\n\nNot as dramatic a story here but the time spent asleep is reduced and we\ncan still see what wait_iff_congested is going to sleep when necessary.\n\nMMTests Statistics: duration\nUser/Sys Time Running Test (seconds)      12028.09   3157.17   3357.79   3199.16\nTotal Elapsed Time (seconds)              10842.07   3138.72   3705.54   3229.85\n\nThe time to complete this test goes way down.  With the full series, we\nare allocating over twice the number of huge pages in 30% of the time and\nthere is a corresponding impact on the allocation latency graph available\nat.\n\nhttp://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-powyah-mean.ps\n\nThis patch:\n\nAdd a trace event for shrink_inactive_list() and updates the sample\npostprocessing script appropriately.  It can be used to determine how many\npages were reclaimed and for non-lumpy reclaim where exactly the pages\nwere reclaimed from.\n\nSigned-off-by: Mel Gorman \u003cmel@csn.ul.ie\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nCc: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "66d9a986cddbbc2ea5db013e7999c621a956cc47",
      "tree": "bfe0d223e9b07c2300f445f9694525e956657d1e",
      "parents": [
        "bce54bbfde07e8b300f39dae14756c12a6ceca65"
      ],
      "author": {
        "name": "Shaohua Li",
        "email": "shaohua.li@intel.com",
        "time": "Tue Oct 26 14:21:37 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:07 2010 -0700"
      },
      "message": "vmscan: delete dead code\n\n`priority\u0027 cannot be negative here.  And the comment is obsolete.\n\nSigned-off-by: Shaohua Li \u003cshaohua.li@intel.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "74e3f3c3391d81a959f58a1191a560703a4415b4",
      "tree": "b4688926ebe2c40b422bd6df0989ec09ea0f7046",
      "parents": [
        "49ac825587f33afec8841b7fab2eb4db775014e6"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Tue Oct 26 14:21:31 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:06 2010 -0700"
      },
      "message": "vmscan: prevent background aging of anon page in no swap system\n\nYing Han reported that backing aging of anon pages in no swap system\ncauses unnecessary TLB flush.\n\nWhen I sent a patch(69c8548175), I wanted this patch but Rik pointed out\nand allowed aging of anon pages to give a chance to promote from inactive\nto active LRU.\n\nIt has a two problem.\n\n1) non-swap system\n\nNever make sense to age anon pages.\n\n2) swap configured but still doesn\u0027t swapon\n\nIt doesn\u0027t make sense to age anon pages until swap-on time.  But it\u0027s\narguable.  If we have aged anon pages by swapon, VM have moved anon pages\nfrom active to inactive.  And in the time swapon by admin, the VM can\u0027t\nreclaim hot pages so we can protect hot pages swapout.\n\nBut let\u0027s think about it.  When does swap-on happen?  It depends on admin.\n we can\u0027t expect it.  Nonetheless, we have done aging of anon pages to\nprotect hot pages swapout.  It means we lost run time overhead when below\nhigh watermark but gain hot page swap-[in/out] overhead when VM decide\nswapout.  Is it true?  Let\u0027s think more detail.  We don\u0027t promote anon\npages in case of non-swap system.  So even though VM does aging of anon\npages, the pages would be in inactive LRU for a long time.  It means many\nof pages in there would mark access bit again.  So access bit hot/code\nseparation would be pointless.\n\nThis patch prevents unnecessary anon pages demotion in not-yet-swapon and\nnon-configured swap system.  Even, in non-configuared swap system\ninactive_anon_is_low can be compiled out.\n\nIt could make side effect that hot anon pages could swap out when admin\ndoes swap on.  But I think sooner or later it would be steady state.  So\nit\u0027s not a big problem.\n\nWe could lose someting but gain more thing(TLB flush and unnecessary\nfunction call to demote anon pages).\n\nSigned-off-by: Ying Han \u003cyinghan@google.com\u003e\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReviewed-by: Rik van Riel \u003criel@redhat.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Johannes Weiner \u003channes@cmpxchg.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "e4455abb50a19562dbfdc51a8424fda9b588bd6d",
      "tree": "add38aec00027e9a115778425a41d3d075a9ced6",
      "parents": [
        "f19e77a3dc884510dba740caa6dee126b7d40156"
      ],
      "author": {
        "name": "Thadeu Lima de Souza Cascardo",
        "email": "cascardo@holoscopio.com",
        "time": "Tue Oct 26 14:21:28 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:05 2010 -0700"
      },
      "message": "mm: only build per-node scan_unevictable functions when NUMA is enabled\n\nNon-NUMA systems do never create these files anyway, since they are only\ncreated by driver subsystem when NUMA is configured.\n\n[akpm@linux-foundation.org: cleanup]\nSigned-off-by: Thadeu Lima de Souza Cascardo \u003ccascardo@holoscopio.com\u003e\nReviewed-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "1b430beee5e388605dfb092b214ef0320f752cf6",
      "tree": "c1b1ece282aab771fd1386a3fe0c6e82cb5c5bfe",
      "parents": [
        "d19d5476f4b9f91d2de92b91588bb118beba6c0d"
      ],
      "author": {
        "name": "Wu Fengguang",
        "email": "fengguang.wu@intel.com",
        "time": "Tue Oct 26 14:21:26 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Oct 26 16:52:05 2010 -0700"
      },
      "message": "writeback: remove nonblocking/encountered_congestion references\n\nThis removes more dead code that was somehow missed by commit 0d99519efef\n(writeback: remove unused nonblocking and congestion checks).  There are\nno behavior change except for the removal of two entries from one of the\next4 tracing interface.\n\nThe nonblocking checks in -\u003ewritepages are no longer used because the\nflusher now prefer to block on get_request_wait() than to skip inodes on\nIO congestion.  The latter will lead to more seeky IO.\n\nThe nonblocking checks in -\u003ewritepage are no longer used because it\u0027s\nredundant with the WB_SYNC_NONE check.\n\nWe no long set -\u003enonblocking in VM page out and page migration, because\na) it\u0027s effectively redundant with WB_SYNC_NONE in current code\nb) it\u0027s old semantic of \"Don\u0027t get stuck on request queues\" is mis-behavior:\n   that would skip some dirty inodes on congestion and page out others, which\n   is unfair in terms of LRU age.\n\nInspired by Christoph Hellwig. Thanks!\n\nSigned-off-by: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Theodore Ts\u0027o \u003ctytso@mit.edu\u003e\nCc: David Howells \u003cdhowells@redhat.com\u003e\nCc: Sage Weil \u003csage@newdream.net\u003e\nCc: Steve French \u003csfrench@samba.org\u003e\nCc: Chris Mason \u003cchris.mason@oracle.com\u003e\nCc: Jens Axboe \u003caxboe@kernel.dk\u003e\nCc: Christoph Hellwig \u003chch@infradead.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "229aebb873e29726b91e076161649cf45154b0bf",
      "tree": "acc02a3702215bce8d914f4c8cc3d7a1382b1c67",
      "parents": [
        "8de547e1824437f3c6af180d3ed2162fa4b3f389",
        "50a23e6eec6f20d55a3a920e47adb455bff6046e"
      ],
      "author": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Oct 24 13:41:39 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Sun Oct 24 13:41:39 2010 -0700"
      },
      "message": "Merge branch \u0027for-next\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial\n\n* \u0027for-next\u0027 of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)\n  Update broken web addresses in arch directory.\n  Update broken web addresses in the kernel.\n  Revert \"drivers/usb: Remove unnecessary return\u0027s from void functions\" for musb gadget\n  Revert \"Fix typo: configuation \u003d\u003e configuration\" partially\n  ida: document IDA_BITMAP_LONGS calculation\n  ext2: fix a typo on comment in ext2/inode.c\n  drivers/scsi: Remove unnecessary casts of private_data\n  drivers/s390: Remove unnecessary casts of private_data\n  net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data\n  drivers/infiniband: Remove unnecessary casts of private_data\n  drivers/gpu/drm: Remove unnecessary casts of private_data\n  kernel/pm_qos_params.c: Remove unnecessary casts of private_data\n  fs/ecryptfs: Remove unnecessary casts of private_data\n  fs/seq_file.c: Remove unnecessary casts of private_data\n  arm: uengine.c: remove C99 comments\n  arm: scoop.c: remove C99 comments\n  Fix typo configue \u003d\u003e configure in comments\n  Fix typo: configuation \u003d\u003e configuration\n  Fix typo interrest[ing|ed] \u003d\u003e interest[ing|ed]\n  Fix various typos of valid in comments\n  ...\n\nFix up trivial conflicts in:\n\tdrivers/char/ipmi/ipmi_si_intf.c\n\tdrivers/usb/gadget/rndis.c\n\tnet/irda/irnet/irnet_ppp.c\n"
    },
    {
      "commit": "d1908362ae0b97374eb8328fbb471576332f9fb1",
      "tree": "db8ba2a4de2e9ac61b8e94fc03a76ddbd24d321f",
      "parents": [
        "eba93fcc34d6c4387ce8fbb53bb7b685f91f3343"
      ],
      "author": {
        "name": "Minchan Kim",
        "email": "minchan.kim@gmail.com",
        "time": "Wed Sep 22 13:05:01 2010 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Wed Sep 22 17:22:39 2010 -0700"
      },
      "message": "vmscan: check all_unreclaimable in direct reclaim path\n\nM.  Vefa Bicakci reported 2.6.35 kernel hang up when hibernation on his\n32bit 3GB mem machine.\n(https://bugzilla.kernel.org/show_bug.cgi?id\u003d16771). Also he bisected\nthe regression to\n\n  commit bb21c7ce18eff8e6e7877ca1d06c6db719376e3c\n  Author: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\n  Date:   Fri Jun 4 14:15:05 2010 -0700\n\n     vmscan: fix do_try_to_free_pages() return value when priority\u003d\u003d0 reclaim failure\n\nAt first impression, this seemed very strange because the above commit\nonly chenged function return value and hibernate_preallocate_memory()\nignore return value of shrink_all_memory().  But it\u0027s related.\n\nNow, page allocation from hibernation code may enter infinite loop if the\nsystem has highmem.  The reasons are that vmscan don\u0027t care enough OOM\ncase when oom_killer_disabled.\n\nThe problem sequence is following as.\n\n1. hibernation\n2. oom_disable\n3. alloc_pages\n4. do_try_to_free_pages\n       if (scanning_global_lru(sc) \u0026\u0026 !all_unreclaimable)\n               return 1;\n\nIf kswapd is not freozen, it would set zone-\u003eall_unreclaimable to 1 and\nthen shrink_zones maybe return true(ie, all_unreclaimable is true).  So at\nlast, alloc_pages could go to _nopage_.  If it is, it should have no\nproblem.\n\nThis patch adds all_unreclaimable check to protect in direct reclaim path,\ntoo.  It can care of hibernation OOM case and help bailout\nall_unreclaimable case slightly.\n\nSigned-off-by: KOSAKI Motohiro \u003ckosaki.motohiro@jp.fujitsu.com\u003e\nSigned-off-by: Minchan Kim \u003cminchan.kim@gmail.com\u003e\nReported-by: M. Vefa Bicakci \u003cbicave@superonline.com\u003e\nReported-by: \u003ccaiqian@redhat.com\u003e\nReviewed-by: Johannes Weiner \u003channes@cmpxchg.org\u003e\nTested-by: \u003ccaiqian@redhat.com\u003e\nAcked-by: Rafael J. Wysocki \u003crjw@sisk.pl\u003e\nAcked-by: Rik van Riel \u003criel@redhat.com\u003e\nAcked-by: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: \u003cstable@kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    }
  ],
  "next": "415b54e37a5d0efa7ff5d4d12285b1e82d574c3e"
}
