| Eduard - Gabriel Munteanu | aa46a7e | 2008-08-10 20:14:04 +0300 | [diff] [blame] | 1 | kmemtrace - Kernel Memory Tracer | 
|  | 2 |  | 
|  | 3 | by Eduard - Gabriel Munteanu | 
|  | 4 | <eduard.munteanu@linux360.ro> | 
|  | 5 |  | 
|  | 6 | I. Introduction | 
|  | 7 | =============== | 
|  | 8 |  | 
|  | 9 | kmemtrace helps kernel developers figure out two things: | 
|  | 10 | 1) how different allocators (SLAB, SLUB etc.) perform | 
|  | 11 | 2) how kernel code allocates memory and how much | 
|  | 12 |  | 
|  | 13 | To do this, we trace every allocation and export information to the userspace | 
|  | 14 | through the relay interface. We export things such as the number of requested | 
|  | 15 | bytes, the number of bytes actually allocated (i.e. including internal | 
|  | 16 | fragmentation), whether this is a slab allocation or a plain kmalloc() and so | 
|  | 17 | on. | 
|  | 18 |  | 
|  | 19 | The actual analysis is performed by a userspace tool (see section III for | 
|  | 20 | details on where to get it from). It logs the data exported by the kernel, | 
|  | 21 | processes it and (as of writing this) can provide the following information: | 
|  | 22 | - the total amount of memory allocated and fragmentation per call-site | 
|  | 23 | - the amount of memory allocated and fragmentation per allocation | 
|  | 24 | - total memory allocated and fragmentation in the collected dataset | 
|  | 25 | - number of cross-CPU allocation and frees (makes sense in NUMA environments) | 
|  | 26 |  | 
|  | 27 | Moreover, it can potentially find inconsistent and erroneous behavior in | 
|  | 28 | kernel code, such as using slab free functions on kmalloc'ed memory or | 
|  | 29 | allocating less memory than requested (but not truly failed allocations). | 
|  | 30 |  | 
|  | 31 | kmemtrace also makes provisions for tracing on some arch and analysing the | 
|  | 32 | data on another. | 
|  | 33 |  | 
|  | 34 | II. Design and goals | 
|  | 35 | ==================== | 
|  | 36 |  | 
|  | 37 | kmemtrace was designed to handle rather large amounts of data. Thus, it uses | 
|  | 38 | the relay interface to export whatever is logged to userspace, which then | 
|  | 39 | stores it. Analysis and reporting is done asynchronously, that is, after the | 
|  | 40 | data is collected and stored. By design, it allows one to log and analyse | 
|  | 41 | on different machines and different arches. | 
|  | 42 |  | 
|  | 43 | As of writing this, the ABI is not considered stable, though it might not | 
|  | 44 | change much. However, no guarantees are made about compatibility yet. When | 
|  | 45 | deemed stable, the ABI should still allow easy extension while maintaining | 
|  | 46 | backward compatibility. This is described further in Documentation/ABI. | 
|  | 47 |  | 
|  | 48 | Summary of design goals: | 
|  | 49 | - allow logging and analysis to be done across different machines | 
|  | 50 | - be fast and anticipate usage in high-load environments (*) | 
|  | 51 | - be reasonably extensible | 
|  | 52 | - make it possible for GNU/Linux distributions to have kmemtrace | 
|  | 53 | included in their repositories | 
|  | 54 |  | 
|  | 55 | (*) - one of the reasons Pekka Enberg's original userspace data analysis | 
|  | 56 | tool's code was rewritten from Perl to C (although this is more than a | 
|  | 57 | simple conversion) | 
|  | 58 |  | 
|  | 59 |  | 
|  | 60 | III. Quick usage guide | 
|  | 61 | ====================== | 
|  | 62 |  | 
|  | 63 | 1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable | 
| Pekka Enberg | bf6803d | 2008-10-10 11:02:59 +0300 | [diff] [blame] | 64 | CONFIG_KMEMTRACE). | 
| Eduard - Gabriel Munteanu | aa46a7e | 2008-08-10 20:14:04 +0300 | [diff] [blame] | 65 |  | 
|  | 66 | 2) Get the userspace tool and build it: | 
| Matt Kraai | ff2f5ff | 2009-06-04 21:43:10 -0700 | [diff] [blame] | 67 | $ git clone git://repo.or.cz/kmemtrace-user.git		# current repository | 
| Eduard - Gabriel Munteanu | aa46a7e | 2008-08-10 20:14:04 +0300 | [diff] [blame] | 68 | $ cd kmemtrace-user/ | 
|  | 69 | $ ./autogen.sh | 
|  | 70 | $ ./configure | 
|  | 71 | $ make | 
|  | 72 |  | 
|  | 73 | 3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the | 
|  | 74 | 'single' runlevel (so that relay buffers don't fill up easily), and run | 
|  | 75 | kmemtrace: | 
|  | 76 | # '$' does not mean user, but root here. | 
|  | 77 | $ mount -t debugfs none /sys/kernel/debug | 
|  | 78 | $ mount -t proc none /proc | 
|  | 79 | $ cd path/to/kmemtrace-user/ | 
|  | 80 | $ ./kmemtraced | 
|  | 81 | Wait a bit, then stop it with CTRL+C. | 
|  | 82 | $ cat /sys/kernel/debug/kmemtrace/total_overruns	# Check if we didn't | 
|  | 83 | # overrun, should | 
|  | 84 | # be zero. | 
|  | 85 | $ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to | 
|  | 86 | check its correctness] | 
|  | 87 | $ ./kmemtrace-report | 
|  | 88 |  | 
|  | 89 | Now you should have a nice and short summary of how the allocator performs. | 
|  | 90 |  | 
|  | 91 | IV. FAQ and known issues | 
|  | 92 | ======================== | 
|  | 93 |  | 
|  | 94 | Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix | 
|  | 95 | this? Should I worry? | 
|  | 96 | A: If it's non-zero, this affects kmemtrace's accuracy, depending on how | 
|  | 97 | large the number is. You can fix it by supplying a higher | 
|  | 98 | 'kmemtrace.subbufs=N' kernel parameter. | 
|  | 99 | --- | 
|  | 100 |  | 
|  | 101 | Q: kmemtrace_check reports errors, how do I fix this? Should I worry? | 
|  | 102 | A: This is a bug and should be reported. It can occur for a variety of | 
|  | 103 | reasons: | 
|  | 104 | - possible bugs in relay code | 
|  | 105 | - possible misuse of relay by kmemtrace | 
|  | 106 | - timestamps being collected unorderly | 
|  | 107 | Or you may fix it yourself and send us a patch. | 
|  | 108 | --- | 
|  | 109 |  | 
|  | 110 | Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? | 
|  | 111 | A: This is a known issue and I'm working on it. These might be true errors | 
|  | 112 | in kernel code, which may have inconsistent behavior (e.g. allocating memory | 
|  | 113 | with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed | 
|  | 114 | out this behavior may work with SLAB, but may fail with other allocators. | 
|  | 115 |  | 
|  | 116 | It may also be due to lack of tracing in some unusual allocator functions. | 
|  | 117 |  | 
|  | 118 | We don't want bug reports regarding this issue yet. | 
|  | 119 | --- | 
|  | 120 |  | 
|  | 121 | V. See also | 
|  | 122 | =========== | 
|  | 123 |  | 
|  | 124 | Documentation/kernel-parameters.txt | 
|  | 125 | Documentation/ABI/testing/debugfs-kmemtrace | 
|  | 126 |  |