)]}'
{
  "log": [
    {
      "commit": "39d997b514e12d5aff0dca206eb8996b3957927e",
      "tree": "d63202847a8a421fbd1fc39e9a2433dbc86ce104",
      "parents": [
        "6b7b284958d47b77d06745b36bc7f36dab769d9b"
      ],
      "author": {
        "name": "Akinobu Mita",
        "email": "akinobu.mita@gmail.com",
        "time": "Mon Dec 21 16:20:16 2009 -0800"
      },
      "committer": {
        "name": "Ingo Molnar",
        "email": "mingo@elte.hu",
        "time": "Mon Dec 28 10:41:39 2009 +0100"
      },
      "message": "x86, core: Optimize hweight32()\n\nOptimize hweight32 by using the same technique in hweight64.\n\nThe proof of this technique can be found in the commit log for\nf9b4192923fa6e38331e88214b1fe5fc21583fcc (\"bitops: hweight()\nspeedup\").\n\nThe userspace benchmark on x86_32 showed 20% speedup with\nbitmap_weight() which uses hweight32 to count bits for each\nunsigned long on 32bit architectures.\n\n int main(void)\n {\n\t#define SZ (1024 * 1024 * 512)\n\n\tstatic DECLARE_BITMAP(bitmap, SZ) \u003d {\n\t        [0 ... 100] \u003d 1,\n\t};\n\n\treturn bitmap_weight(bitmap, SZ);\n }\n\nSigned-off-by: Akinobu Mita \u003cakinobu.mita@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nCc: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\nLKML-Reference: \u003c1258603932-4590-1-git-send-email-akinobu.mita@gmail.com\u003e\n[ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]\nSigned-off-by: Ingo Molnar \u003cmingo@elte.hu\u003e\n"
    },
    {
      "commit": "1977f032722c27ee3730284582fd3991ad9ac81b",
      "tree": "00ba5692a697a387399131850c56e01345c7cace",
      "parents": [
        "1276b103c20603835d9b903cae099125e8c2c5a3"
      ],
      "author": {
        "name": "Jiri Slaby",
        "email": "jirislaby@gmail.com",
        "time": "Thu Oct 18 23:40:25 2007 -0700"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@woody.linux-foundation.org",
        "time": "Fri Oct 19 11:53:41 2007 -0700"
      },
      "message": "remove asm/bitops.h includes\n\nremove asm/bitops.h includes\n\nincluding asm/bitops directly may cause compile errors. don\u0027t include it\nand include linux/bitops instead. next patch will deny including asm header\ndirectly.\n\nCc: Adrian Bunk \u003cbunk@kernel.org\u003e\nSigned-off-by: Jiri Slaby \u003cjirislaby@gmail.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@linux-foundation.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@linux-foundation.org\u003e\n"
    },
    {
      "commit": "0136611c62e8650e354b95c76dff6d2ce6030eff",
      "tree": "9ba66105bccc4d83b84663b8dda7e51962c22a04",
      "parents": [
        "8380aabb99719af583447133f19a4d8074b5c337"
      ],
      "author": {
        "name": "Andi Kleen",
        "email": "ak@suse.de",
        "time": "Tue Sep 26 10:52:38 2006 +0200"
      },
      "committer": {
        "name": "Andi Kleen",
        "email": "andi@basil.nowhere.org",
        "time": "Tue Sep 26 10:52:38 2006 +0200"
      },
      "message": "[PATCH] optimize hweight64 for x86_64\n\nBased on patch from David Rientjes \u003crientjes@google.com\u003e, but\nchanged by AK.\n\nOptimizes the 64-bit hamming weight for x86_64 processors assuming they\nhave fast multiplication.  Uses five fewer bitops than the generic\nhweight64.  Benchmark on one EMT64 showed ~25% speedup with 2^24\nconsecutive calls.\n\nDefine a new ARCH_HAS_FAST_MULTIPLIER that can be set by other\narchitectures that can also multiply fast.\n\nSigned-off-by: Andi Kleen \u003cak@suse.de\u003e\n"
    },
    {
      "commit": "f9b4192923fa6e38331e88214b1fe5fc21583fcc",
      "tree": "a75d460256e44d79f04cd28895f689f6e7cb7bad",
      "parents": [
        "37d54111c133bea05fbae9dfe6d3d61a1b19c09b"
      ],
      "author": {
        "name": "Akinobu Mita",
        "email": "mita@miraclelinux.com",
        "time": "Sun Mar 26 01:40:00 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sun Mar 26 08:59:30 2006 -0800"
      },
      "message": "[PATCH] bitops: hweight() speedup\n\n\u003clinux@horizon.com\u003e wrote:\n\nThis is an extremely well-known technique.  You can see a similar version that\nuses a multiply for the last few steps at\nhttp://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel whch\nrefers to \"Software Optimization Guide for AMD Athlon 64 and Opteron\nProcessors\"\nhttp://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF\n\nIt\u0027s section 8.6, \"Efficient Implementation of Population-Count Function in\n32-bit Mode\", pages 179-180.\n\nIt uses the name that I am more familiar with, \"popcount\" (population count),\nalthough \"Hamming weight\" also makes sense.\n\nAnyway, the proof of correctness proceeds as follows:\n\n\tb \u003d a - ((a \u003e\u003e 1) \u0026 0x55555555);\n\tc \u003d (b \u0026 0x33333333) + ((b \u003e\u003e 2) \u0026 0x33333333);\n\td \u003d (c + (c \u003e\u003e 4)) \u0026 0x0f0f0f0f;\n#if SLOW_MULTIPLY\n\te \u003d d + (d \u003e\u003e 8)\n\tf \u003d e + (e \u003e\u003e 16);\n\treturn f \u0026 63;\n#else\n\t/* Useful if multiply takes at most 4 cycles */\n\treturn (d * 0x01010101) \u003e\u003e 24;\n#endif\n\nThe input value a can be thought of as 32 1-bit fields each holding their own\nhamming weight.  Now look at it as 16 2-bit fields.  Each 2-bit field a1..a0\nhas the value 2*a1 + a0.  This can be converted into the hamming weight of the\n2-bit field a1+a0 by subtracting a1.\n\nThat\u0027s what the (a \u003e\u003e 1) \u0026 mask subtraction does.  Since there can be no\nborrows, you can just do it all at once.\n\nEnumerating the 4 possible cases:\n\n0b00 \u003d 0  -\u003e  0 - 0 \u003d 0\n0b01 \u003d 1  -\u003e  1 - 0 \u003d 1\n0b10 \u003d 2  -\u003e  2 - 1 \u003d 1\n0b11 \u003d 3  -\u003e  3 - 1 \u003d 2\n\nThe next step consists of breaking up b (made of 16 2-bir fields) into\neven and odd halves and adding them into 4-bit fields.  Since the largest\npossible sum is 2+2 \u003d 4, which will not fit into a 4-bit field, the 2-bit\n                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n                          \"which will not fit into a 2-bit field\"\n\nfields have to be masked before they are added.\n\nAfter this point, the masking can be delayed.  Each 4-bit field holds a\npopulation count from 0..4, taking at most 3 bits.  These numbers can be added\nwithout overflowing a 4-bit field, so we can compute c + (c \u003e\u003e 4), and only\nthen mask off the unwanted bits.\n\nThis produces d, a number of 4 8-bit fields, each in the range 0..8.  From\nthis point, we can shift and add d multiple times without overflowing an 8-bit\nfield, and only do a final mask at the end.\n\nThe number to mask with has to be at least 63 (so that 32 on\u0027t be truncated),\nbut can also be 128 or 255.  The x86 has a special encoding for signed\nimmediate byte values -128..127, so the value of 255 is slower.  On other\nprocessors, a special \"sign extend byte\" instruction might be faster.\n\nOn a processor with fast integer multiplies (Athlon but not P4), you can\nreduce the final few serially dependent instructions to a single integer\nmultiply.  Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the\nrange 0..8.  The multiply forms the partial products:\n\n\t           d3 d2 d1 d0\n\t        d3 d2 d1 d0\n\t     d3 d2 d1 d0\n\t+ d3 d2 d1 d0\n\t----------------------\n\t           e3 e2 e1 e0\n\nWhere e3 \u003d d3 + d2 + d1 + d0.   e2, e1 and e0 obviously cannot generate\nany carries.\n\nSigned-off-by: Akinobu Mita \u003cmita@miraclelinux.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    },
    {
      "commit": "3b9ed1a5d2d121f32d2cb4f2b05f1fc57c99c946",
      "tree": "d8ea52d6b490c0e3b3851e1887c01700cc0115c6",
      "parents": [
        "09020adb61416c4307de35941a9725a5e33d9beb"
      ],
      "author": {
        "name": "Akinobu Mita",
        "email": "mita@miraclelinux.com",
        "time": "Sun Mar 26 01:39:13 2006 -0800"
      },
      "committer": {
        "name": "Linus Torvalds",
        "email": "torvalds@g5.osdl.org",
        "time": "Sun Mar 26 08:57:11 2006 -0800"
      },
      "message": "[PATCH] bitops: generic hweight{64,32,16,8}()\n\nThis patch introduces the C-language equivalents of the functions below:\n\nunsigned int hweight32(unsigned int w);\nunsigned int hweight16(unsigned int w);\nunsigned int hweight8(unsigned int w);\nunsigned long hweight64(__u64 w);\n\nIn include/asm-generic/bitops/hweight.h\n\nThis code largely copied from: include/linux/bitops.h\n\nSigned-off-by: Akinobu Mita \u003cmita@miraclelinux.com\u003e\nSigned-off-by: Andrew Morton \u003cakpm@osdl.org\u003e\nSigned-off-by: Linus Torvalds \u003ctorvalds@osdl.org\u003e\n"
    }
  ]
}
