)]}'
{
  "log": [
    {
      "commit": "828502d30073036a486d96b1fe051e0f08b6df83",
      "tree": "61b728cbeb88c1a2c522307dff6264e8d0b1d8f1",
      "parents": [
        "451ea25da71590361c71bf3044c55b870a887d53"
      ],
      "author": {
        "name": "Izik Eidus",
        "email": "ieidus@redhat.com",
        "time": "Mon Sep 21 17:01:51 2009 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Tue Sep 22 07:17:31 2009 -0700"
      },
      "message": "ksm: add mmu_notifier set_pte_at_notify()\n\nKSM is a linux driver that allows dynamicly sharing identical memory pages\nbetween one or more processes.\n\nUnlike tradtional page sharing that is made at the allocation of the\nmemory, ksm do it dynamicly after the memory was created.  Memory is\nperiodically scanned; identical pages are identified and merged.\n\nThe sharing is made in a transparent way to the processes that use it.\n\nKsm is highly important for hypervisors (kvm), where in production\nenviorments there might be many copys of the same data data among the host\nmemory.  This kind of data can be: similar kernels, librarys, cache, and\nso on.\n\nEven that ksm was wrote for kvm, any userspace application that want to\nuse it to share its data can try it.\n\nKsm may be useful for any application that might have similar (page\naligment) data strctures among the memory, ksm will find this data merge\nit to one copy, and even if it will be changed and thereforew copy on\nwrited, ksm will merge it again as soon as it will be identical again.\n\nAnother reason to consider using ksm is the fact that it might simplify\nalot the userspace code of application that want to use shared private\ndata, instead that the application will mange shared area, ksm will do\nthis for the application, and even write to this data will be allowed\nwithout any synchinization acts from the application.\n\nKsm was designed to be a loadable module that doesn\u0027t change the VM code\nof linux.\n\nThis patch:\n\nThe set_pte_at_notify() macro allows setting a pte in the shadow page\ntable directly, instead of flushing the shadow page table entry and then\ngetting vmexit to set it.  It uses a new change_pte() callback to do so.\n\nset_pte_at_notify() is an optimization for kvm, and other users of\nmmu_notifiers, for COW pages.  It is useful for kvm when ksm is used,\nbecause it allows kvm not to have to receive vmexit and only then map the\nksm page into the shadow page table, but instead map it directly at the\nsame time as Linux maps the page into the host page table.\n\nUsers of mmu_notifiers who don\u0027t implement new mmu_notifier_change_pte()\ncallback will just receive the mmu_notifier_invalidate_page() callback.\n\nSigned-off-by: Izik Eidus \u003cieidus@redhat.com\u003e\nSigned-off-by: Chris Wright \u003cchrisw@redhat.com\u003e\nSigned-off-by: Hugh Dickins \u003chugh.dickins@tiscali.co.uk\u003e\nCc: Andrea Arcangeli \u003caarcange@redhat.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nCc: Wu Fengguang \u003cfengguang.wu@intel.com\u003e\nCc: Balbir Singh \u003cbalbir@in.ibm.com\u003e\nCc: Hugh Dickins \u003chugh.dickins@tiscali.co.uk\u003e\nCc: KAMEZAWA Hiroyuki \u003ckamezawa.hiroyu@jp.fujitsu.com\u003e\nCc: Lee Schermerhorn \u003clee.schermerhorn@hp.com\u003e\nCc: Avi Kivity \u003cavi@redhat.com\u003e\nCc: Nick Piggin \u003cnickpiggin@yahoo.com.au\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "cddb8a5c14aa89810b40495d94d3d2a0faee6619",
      "tree": "d0b47b071f7d2dd1d6f9c36084aa8cfcef90d1da",
      "parents": [
        "7906d00cd1f687268f0a3599442d113767795ae6"
      ],
      "author": {
        "name": "Andrea Arcangeli",
        "email": "andrea@qumranet.com",
        "time": "Mon Jul 28 15:46:29 2008 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@linux-foundation.org",
        "time": "Mon Jul 28 16:30:21 2008 -0700"
      },
      "message": "mmu-notifiers: core\n\nWith KVM/GFP/XPMEM there isn\u0027t just the primary CPU MMU pointing to pages.\n There are secondary MMUs (with secondary sptes and secondary tlbs) too.\nsptes in the kvm case are shadow pagetables, but when I say spte in\nmmu-notifier context, I mean \"secondary pte\".  In GRU case there\u0027s no\nactual secondary pte and there\u0027s only a secondary tlb because the GRU\nsecondary MMU has no knowledge about sptes and every secondary tlb miss\nevent in the MMU always generates a page fault that has to be resolved by\nthe CPU (this is not the case of KVM where the a secondary tlb miss will\nwalk sptes in hardware and it will refill the secondary tlb transparently\nto software if the corresponding spte is present).  The same way\nzap_page_range has to invalidate the pte before freeing the page, the spte\n(and secondary tlb) must also be invalidated before any page is freed and\nreused.\n\nCurrently we take a page_count pin on every page mapped by sptes, but that\nmeans the pages can\u0027t be swapped whenever they\u0027re mapped by any spte\nbecause they\u0027re part of the guest working set.  Furthermore a spte unmap\nevent can immediately lead to a page to be freed when the pin is released\n(so requiring the same complex and relatively slow tlb_gather smp safe\nlogic we have in zap_page_range and that can be avoided completely if the\nspte unmap event doesn\u0027t require an unpin of the page previously mapped in\nthe secondary MMU).\n\nThe mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk-\u003emm and know\nwhen the VM is swapping or freeing or doing anything on the primary MMU so\nthat the secondary MMU code can drop sptes before the pages are freed,\navoiding all page pinning and allowing 100% reliable swapping of guest\nphysical address space.  Furthermore it avoids the code that teardown the\nmappings of the secondary MMU, to implement a logic like tlb_gather in\nzap_page_range that would require many IPI to flush other cpu tlbs, for\neach fixed number of spte unmapped.\n\nTo make an example: if what happens on the primary MMU is a protection\ndowngrade (from writeable to wrprotect) the secondary MMU mappings will be\ninvalidated, and the next secondary-mmu-page-fault will call\nget_user_pages and trigger a do_wp_page through get_user_pages if it\ncalled get_user_pages with write\u003d1, and it\u0027ll re-establishing an updated\nspte or secondary-tlb-mapping on the copied page.  Or it will setup a\nreadonly spte or readonly tlb mapping if it\u0027s a guest-read, if it calls\nget_user_pages with write\u003d0.  This is just an example.\n\nThis allows to map any page pointed by any pte (and in turn visible in the\nprimary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an\nfull MMU with both sptes and secondary-tlb like the shadow-pagetable layer\nwith kvm), or a remote DMA in software like XPMEM (hence needing of\nschedule in XPMEM code to send the invalidate to the remote node, while no\nneed to schedule in kvm/gru as it\u0027s an immediate event like invalidating\nprimary-mmu pte).\n\nAt least for KVM without this patch it\u0027s impossible to swap guests\nreliably.  And having this feature and removing the page pin allows\nseveral other optimizations that simplify life considerably.\n\nDependencies:\n\n1) mm_take_all_locks() to register the mmu notifier when the whole VM\n   isn\u0027t doing anything with \"mm\".  This allows mmu notifier users to keep\n   track if the VM is in the middle of the invalidate_range_begin/end\n   critical section with an atomic counter incraese in range_begin and\n   decreased in range_end.  No secondary MMU page fault is allowed to map\n   any spte or secondary tlb reference, while the VM is in the middle of\n   range_begin/end as any page returned by get_user_pages in that critical\n   section could later immediately be freed without any further\n   -\u003einvalidate_page notification (invalidate_range_begin/end works on\n   ranges and -\u003einvalidate_page isn\u0027t called immediately before freeing\n   the page).  To stop all page freeing and pagetable overwrites the\n   mmap_sem must be taken in write mode and all other anon_vma/i_mmap\n   locks must be taken too.\n\n2) It\u0027d be a waste to add branches in the VM if nobody could possibly\n   run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if\n   CONFIG_KVM\u003dm/y.  In the current kernel kvm won\u0027t yet take advantage of\n   mmu notifiers, but this already allows to compile a KVM external module\n   against a kernel with mmu notifiers enabled and from the next pull from\n   kvm.git we\u0027ll start using them.  And GRU/XPMEM will also be able to\n   continue the development by enabling KVM\u003dm in their config, until they\n   submit all GRU/XPMEM GPLv2 code to the mainline kernel.  Then they can\n   also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM\u003dn).\n   This guarantees nobody selects MMU_NOTIFIER\u003dy if KVM and GRU and XPMEM\n   are all \u003dn.\n\nThe mmu_notifier_register call can fail because mm_take_all_locks may be\ninterrupted by a signal and return -EINTR.  Because mmu_notifier_reigster\nis used when a driver startup, a failure can be gracefully handled.  Here\nan example of the change applied to kvm to register the mmu notifiers.\nUsually when a driver startups other allocations are required anyway and\n-ENOMEM failure paths exists already.\n\n struct  kvm *kvm_arch_create_vm(void)\n {\n        struct kvm *kvm \u003d kzalloc(sizeof(struct kvm), GFP_KERNEL);\n+       int err;\n\n        if (!kvm)\n                return ERR_PTR(-ENOMEM);\n\n        INIT_LIST_HEAD(\u0026kvm-\u003earch.active_mmu_pages);\n\n+       kvm-\u003earch.mmu_notifier.ops \u003d \u0026kvm_mmu_notifier_ops;\n+       err \u003d mmu_notifier_register(\u0026kvm-\u003earch.mmu_notifier, current-\u003emm);\n+       if (err) {\n+               kfree(kvm);\n+               return ERR_PTR(err);\n+       }\n+\n        return kvm;\n }\n\nmmu_notifier_unregister returns void and it\u0027s reliable.\n\nThe patch also adds a few needed but missing includes that would prevent\nkernel to compile after these changes on non-x86 archs (x86 didn\u0027t need\nthem by luck).\n\n[akpm@linux-foundation.org: coding-style fixes]\n[akpm@linux-foundation.org: fix mm/filemap_xip.c build]\n[akpm@linux-foundation.org: fix mm/mmu_notifier.c build]\nSigned-off-by: Andrea Arcangeli \u003candrea@qumranet.com\u003e\nSigned-off-by: Nick Piggin \u003cnpiggin@suse.de\u003e\nSigned-off-by: Christoph Lameter \u003ccl@linux-foundation.org\u003e\nCc: Jack Steiner \u003csteiner@sgi.com\u003e\nCc: Robin Holt \u003cholt@sgi.com\u003e\nCc: Nick Piggin \u003cnpiggin@suse.de\u003e\nCc: Peter Zijlstra \u003ca.p.zijlstra@chello.nl\u003e\nCc: Kanoj Sarcar \u003ckanojsarcar@yahoo.com\u003e\nCc: Roland Dreier \u003crdreier@cisco.com\u003e\nCc: Steve Wise \u003cswise@opengridcomputing.com\u003e\nCc: Avi Kivity \u003cavi@qumranet.com\u003e\nCc: Hugh Dickins \u003chugh@veritas.com\u003e\nCc: Rusty Russell \u003crusty@rustcorp.com.au\u003e\nCc: Anthony Liguori \u003caliguori@us.ibm.com\u003e\nCc: Chris Wright \u003cchrisw@redhat.com\u003e\nCc: Marcelo Tosatti \u003cmarcelo@kvack.org\u003e\nCc: Eric Dumazet \u003cdada1@cosmosbay.com\u003e\nCc: \"Paul E. McKenney\" \u003cpaulmck@us.ibm.com\u003e\nCc: Izik Eidus \u003cizike@qumranet.com\u003e\nCc: Anthony Liguori \u003caliguori@us.ibm.com\u003e\nCc: Rik van Riel \u003criel@redhat.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    }
  ]
}
