)]}'
{
  "log": [
    {
      "commit": "068fbad288a2c18b75b0425fb56d241f018a1cb5",
      "tree": "441674bb0658208fbc5f6eab96b8641ab7345caf",
      "parents": [
        "6e16d89bcd668a95eb22add24c02d80890232b66"
      ],
      "author": {
        "name": "Mathieu Desnoyers",
        "email": "mathieu.desnoyers@polymtl.ca",
        "time": "Thu Feb 07 00:16:07 2008 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Thu Feb 07 08:42:30 2008 -0800"
      },
      "message": "Add cmpxchg_local to asm-generic for per cpu atomic operations\n\nEmulates the cmpxchg_local by disabling interrupts around variable modification.\nThis is not reentrant wrt NMIs and MCEs. It is only protected against normal\ninterrupts, but this is enough for architectures without such interrupt sources\nor if used in a context where the data is not shared with such handlers.\n\nIt can be used as a fallback for architectures lacking a real cmpxchg\ninstruction.\n\nFor architectures that have a real cmpxchg but does not have NMIs or MCE,\ntesting which of the generic vs architecture specific cmpxchg is the fastest\nshould be done.\n\nasm-generic/cmpxchg.h defines a cmpxchg that uses cmpxchg_local. It is meant to\nbe used as a cmpxchg fallback for architectures that do not support SMP.\n\n* Patch series comments\n\nUsing cmpxchg_local shows a performance improvements of the fast path goes from\na 66% speedup on a Pentium 4 to a 14% speedup on AMD64.\n\nIn detail:\n\nTested-by: Mathieu Desnoyers \u003cmathieu.desnoyers@polymtl.ca\u003e\nMeasurements on a Pentium4, 3GHz, Hyperthread.\nSLUB Performance testing\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n1. Kmalloc: Repeatedly allocate then free test\n\n* slub HEAD, test 1\nkmalloc(8) \u003d 201 cycles         kfree \u003d 351 cycles\nkmalloc(16) \u003d 198 cycles        kfree \u003d 359 cycles\nkmalloc(32) \u003d 200 cycles        kfree \u003d 381 cycles\nkmalloc(64) \u003d 224 cycles        kfree \u003d 394 cycles\nkmalloc(128) \u003d 285 cycles       kfree \u003d 424 cycles\nkmalloc(256) \u003d 411 cycles       kfree \u003d 546 cycles\nkmalloc(512) \u003d 480 cycles       kfree \u003d 619 cycles\nkmalloc(1024) \u003d 623 cycles      kfree \u003d 750 cycles\nkmalloc(2048) \u003d 686 cycles      kfree \u003d 811 cycles\nkmalloc(4096) \u003d 482 cycles      kfree \u003d 538 cycles\nkmalloc(8192) \u003d 680 cycles      kfree \u003d 734 cycles\nkmalloc(16384) \u003d 713 cycles     kfree \u003d 843 cycles\n\n* Slub HEAD, test 2\nkmalloc(8) \u003d 190 cycles         kfree \u003d 351 cycles\nkmalloc(16) \u003d 195 cycles        kfree \u003d 360 cycles\nkmalloc(32) \u003d 201 cycles        kfree \u003d 370 cycles\nkmalloc(64) \u003d 245 cycles        kfree \u003d 389 cycles\nkmalloc(128) \u003d 283 cycles       kfree \u003d 413 cycles\nkmalloc(256) \u003d 409 cycles       kfree \u003d 547 cycles\nkmalloc(512) \u003d 476 cycles       kfree \u003d 616 cycles\nkmalloc(1024) \u003d 628 cycles      kfree \u003d 753 cycles\nkmalloc(2048) \u003d 684 cycles      kfree \u003d 811 cycles\nkmalloc(4096) \u003d 480 cycles      kfree \u003d 539 cycles\nkmalloc(8192) \u003d 661 cycles      kfree \u003d 746 cycles\nkmalloc(16384) \u003d 741 cycles     kfree \u003d 856 cycles\n\n* cmpxchg_local Slub test\nkmalloc(8) \u003d 83 cycles          kfree \u003d 363 cycles\nkmalloc(16) \u003d 85 cycles         kfree \u003d 372 cycles\nkmalloc(32) \u003d 92 cycles         kfree \u003d 377 cycles\nkmalloc(64) \u003d 115 cycles        kfree \u003d 397 cycles\nkmalloc(128) \u003d 179 cycles       kfree \u003d 438 cycles\nkmalloc(256) \u003d 314 cycles       kfree \u003d 564 cycles\nkmalloc(512) \u003d 398 cycles       kfree \u003d 615 cycles\nkmalloc(1024) \u003d 573 cycles      kfree \u003d 745 cycles\nkmalloc(2048) \u003d 629 cycles      kfree \u003d 816 cycles\nkmalloc(4096) \u003d 473 cycles      kfree \u003d 548 cycles\nkmalloc(8192) \u003d 659 cycles      kfree \u003d 745 cycles\nkmalloc(16384) \u003d 724 cycles     kfree \u003d 843 cycles\n\n2. Kmalloc: alloc/free test\n\n* slub HEAD, test 1\nkmalloc(8)/kfree \u003d 322 cycles\nkmalloc(16)/kfree \u003d 318 cycles\nkmalloc(32)/kfree \u003d 318 cycles\nkmalloc(64)/kfree \u003d 325 cycles\nkmalloc(128)/kfree \u003d 318 cycles\nkmalloc(256)/kfree \u003d 328 cycles\nkmalloc(512)/kfree \u003d 328 cycles\nkmalloc(1024)/kfree \u003d 328 cycles\nkmalloc(2048)/kfree \u003d 328 cycles\nkmalloc(4096)/kfree \u003d 678 cycles\nkmalloc(8192)/kfree \u003d 1013 cycles\nkmalloc(16384)/kfree \u003d 1157 cycles\n\n* Slub HEAD, test 2\nkmalloc(8)/kfree \u003d 323 cycles\nkmalloc(16)/kfree \u003d 318 cycles\nkmalloc(32)/kfree \u003d 318 cycles\nkmalloc(64)/kfree \u003d 318 cycles\nkmalloc(128)/kfree \u003d 318 cycles\nkmalloc(256)/kfree \u003d 328 cycles\nkmalloc(512)/kfree \u003d 328 cycles\nkmalloc(1024)/kfree \u003d 328 cycles\nkmalloc(2048)/kfree \u003d 328 cycles\nkmalloc(4096)/kfree \u003d 648 cycles\nkmalloc(8192)/kfree \u003d 1009 cycles\nkmalloc(16384)/kfree \u003d 1105 cycles\n\n* cmpxchg_local Slub test\nkmalloc(8)/kfree \u003d 112 cycles\nkmalloc(16)/kfree \u003d 103 cycles\nkmalloc(32)/kfree \u003d 103 cycles\nkmalloc(64)/kfree \u003d 103 cycles\nkmalloc(128)/kfree \u003d 112 cycles\nkmalloc(256)/kfree \u003d 111 cycles\nkmalloc(512)/kfree \u003d 111 cycles\nkmalloc(1024)/kfree \u003d 111 cycles\nkmalloc(2048)/kfree \u003d 121 cycles\nkmalloc(4096)/kfree \u003d 650 cycles\nkmalloc(8192)/kfree \u003d 1042 cycles\nkmalloc(16384)/kfree \u003d 1149 cycles\n\nTested-by: Mathieu Desnoyers \u003cmathieu.desnoyers@polymtl.ca\u003e\nMeasurements on a AMD64 2.0 GHz dual-core\n\nIn this test, we seem to remove 10 cycles from the kmalloc fast path.\nOn small allocations, it gives a 14% performance increase. kfree fast\npath also seems to have a 10 cycles improvement.\n\n1. Kmalloc: Repeatedly allocate then free test\n\n* cmpxchg_local slub\nkmalloc(8) \u003d 63 cycles      kfree \u003d 126 cycles\nkmalloc(16) \u003d 66 cycles     kfree \u003d 129 cycles\nkmalloc(32) \u003d 76 cycles     kfree \u003d 138 cycles\nkmalloc(64) \u003d 100 cycles    kfree \u003d 288 cycles\nkmalloc(128) \u003d 128 cycles   kfree \u003d 309 cycles\nkmalloc(256) \u003d 170 cycles   kfree \u003d 315 cycles\nkmalloc(512) \u003d 221 cycles   kfree \u003d 357 cycles\nkmalloc(1024) \u003d 324 cycles  kfree \u003d 393 cycles\nkmalloc(2048) \u003d 354 cycles  kfree \u003d 440 cycles\nkmalloc(4096) \u003d 394 cycles  kfree \u003d 330 cycles\nkmalloc(8192) \u003d 523 cycles  kfree \u003d 481 cycles\nkmalloc(16384) \u003d 643 cycles kfree \u003d 649 cycles\n\n* Base\nkmalloc(8) \u003d 74 cycles      kfree \u003d 113 cycles\nkmalloc(16) \u003d 76 cycles     kfree \u003d 116 cycles\nkmalloc(32) \u003d 85 cycles     kfree \u003d 133 cycles\nkmalloc(64) \u003d 111 cycles    kfree \u003d 279 cycles\nkmalloc(128) \u003d 138 cycles   kfree \u003d 294 cycles\nkmalloc(256) \u003d 181 cycles   kfree \u003d 304 cycles\nkmalloc(512) \u003d 237 cycles   kfree \u003d 327 cycles\nkmalloc(1024) \u003d 340 cycles  kfree \u003d 379 cycles\nkmalloc(2048) \u003d 378 cycles  kfree \u003d 433 cycles\nkmalloc(4096) \u003d 399 cycles  kfree \u003d 329 cycles\nkmalloc(8192) \u003d 528 cycles  kfree \u003d 624 cycles\nkmalloc(16384) \u003d 651 cycles kfree \u003d 737 cycles\n\n2. Kmalloc: alloc/free test\n\n* cmpxchg_local slub\nkmalloc(8)/kfree \u003d 96 cycles\nkmalloc(16)/kfree \u003d 97 cycles\nkmalloc(32)/kfree \u003d 97 cycles\nkmalloc(64)/kfree \u003d 97 cycles\nkmalloc(128)/kfree \u003d 97 cycles\nkmalloc(256)/kfree \u003d 105 cycles\nkmalloc(512)/kfree \u003d 108 cycles\nkmalloc(1024)/kfree \u003d 105 cycles\nkmalloc(2048)/kfree \u003d 107 cycles\nkmalloc(4096)/kfree \u003d 390 cycles\nkmalloc(8192)/kfree \u003d 626 cycles\nkmalloc(16384)/kfree \u003d 662 cycles\n\n* Base\nkmalloc(8)/kfree \u003d 116 cycles\nkmalloc(16)/kfree \u003d 116 cycles\nkmalloc(32)/kfree \u003d 116 cycles\nkmalloc(64)/kfree \u003d 116 cycles\nkmalloc(128)/kfree \u003d 116 cycles\nkmalloc(256)/kfree \u003d 126 cycles\nkmalloc(512)/kfree \u003d 126 cycles\nkmalloc(1024)/kfree \u003d 126 cycles\nkmalloc(2048)/kfree \u003d 126 cycles\nkmalloc(4096)/kfree \u003d 384 cycles\nkmalloc(8192)/kfree \u003d 749 cycles\nkmalloc(16384)/kfree \u003d 786 cycles\n\nTested-by: Christoph Lameter \u003cclameter@sgi.com\u003e\nI can confirm Mathieus\u0027 measurement now:\n\nAthlon64:\n\nregular NUMA/discontig\n\n1. Kmalloc: Repeatedly allocate then free test\n10000 times kmalloc(8) -\u003e 79 cycles kfree -\u003e 92 cycles\n10000 times kmalloc(16) -\u003e 79 cycles kfree -\u003e 93 cycles\n10000 times kmalloc(32) -\u003e 88 cycles kfree -\u003e 95 cycles\n10000 times kmalloc(64) -\u003e 124 cycles kfree -\u003e 132 cycles\n10000 times kmalloc(128) -\u003e 157 cycles kfree -\u003e 247 cycles\n10000 times kmalloc(256) -\u003e 200 cycles kfree -\u003e 257 cycles\n10000 times kmalloc(512) -\u003e 250 cycles kfree -\u003e 277 cycles\n10000 times kmalloc(1024) -\u003e 337 cycles kfree -\u003e 314 cycles\n10000 times kmalloc(2048) -\u003e 365 cycles kfree -\u003e 330 cycles\n10000 times kmalloc(4096) -\u003e 352 cycles kfree -\u003e 240 cycles\n10000 times kmalloc(8192) -\u003e 456 cycles kfree -\u003e 340 cycles\n10000 times kmalloc(16384) -\u003e 646 cycles kfree -\u003e 471 cycles\n2. Kmalloc: alloc/free test\n10000 times kmalloc(8)/kfree -\u003e 124 cycles\n10000 times kmalloc(16)/kfree -\u003e 124 cycles\n10000 times kmalloc(32)/kfree -\u003e 124 cycles\n10000 times kmalloc(64)/kfree -\u003e 124 cycles\n10000 times kmalloc(128)/kfree -\u003e 124 cycles\n10000 times kmalloc(256)/kfree -\u003e 132 cycles\n10000 times kmalloc(512)/kfree -\u003e 132 cycles\n10000 times kmalloc(1024)/kfree -\u003e 132 cycles\n10000 times kmalloc(2048)/kfree -\u003e 132 cycles\n10000 times kmalloc(4096)/kfree -\u003e 319 cycles\n10000 times kmalloc(8192)/kfree -\u003e 486 cycles\n10000 times kmalloc(16384)/kfree -\u003e 539 cycles\n\ncmpxchg_local NUMA/discontig\n\n1. Kmalloc: Repeatedly allocate then free test\n10000 times kmalloc(8) -\u003e 55 cycles kfree -\u003e 90 cycles\n10000 times kmalloc(16) -\u003e 55 cycles kfree -\u003e 92 cycles\n10000 times kmalloc(32) -\u003e 70 cycles kfree -\u003e 91 cycles\n10000 times kmalloc(64) -\u003e 100 cycles kfree -\u003e 141 cycles\n10000 times kmalloc(128) -\u003e 128 cycles kfree -\u003e 233 cycles\n10000 times kmalloc(256) -\u003e 172 cycles kfree -\u003e 251 cycles\n10000 times kmalloc(512) -\u003e 225 cycles kfree -\u003e 275 cycles\n10000 times kmalloc(1024) -\u003e 325 cycles kfree -\u003e 311 cycles\n10000 times kmalloc(2048) -\u003e 346 cycles kfree -\u003e 330 cycles\n10000 times kmalloc(4096) -\u003e 351 cycles kfree -\u003e 238 cycles\n10000 times kmalloc(8192) -\u003e 450 cycles kfree -\u003e 342 cycles\n10000 times kmalloc(16384) -\u003e 630 cycles kfree -\u003e 546 cycles\n2. Kmalloc: alloc/free test\n10000 times kmalloc(8)/kfree -\u003e 81 cycles\n10000 times kmalloc(16)/kfree -\u003e 81 cycles\n10000 times kmalloc(32)/kfree -\u003e 81 cycles\n10000 times kmalloc(64)/kfree -\u003e 81 cycles\n10000 times kmalloc(128)/kfree -\u003e 81 cycles\n10000 times kmalloc(256)/kfree -\u003e 91 cycles\n10000 times kmalloc(512)/kfree -\u003e 90 cycles\n10000 times kmalloc(1024)/kfree -\u003e 91 cycles\n10000 times kmalloc(2048)/kfree -\u003e 90 cycles\n10000 times kmalloc(4096)/kfree -\u003e 318 cycles\n10000 times kmalloc(8192)/kfree -\u003e 483 cycles\n10000 times kmalloc(16384)/kfree -\u003e 536 cycles\n\nChangelog:\n- Ran though checkpatch.\n\nSigned-off-by: Mathieu Desnoyers \u003cmathieu.desnoyers@polymtl.ca\u003e\nCc: \u003clinux-arch@vger.kernel.org\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    }
  ]
}
