| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame^] | 1 | [Sat Mar  2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)] | 
 | 2 |  | 
 | 3 | This is how to track down a bug if you know nothing about kernel hacking.   | 
 | 4 | It's a brute force approach but it works pretty well. | 
 | 5 |  | 
 | 6 | You need: | 
 | 7 |  | 
 | 8 |         . A reproducible bug - it has to happen predictably (sorry) | 
 | 9 |         . All the kernel tar files from a revision that worked to the | 
 | 10 |           revision that doesn't | 
 | 11 |  | 
 | 12 | You will then do: | 
 | 13 |  | 
 | 14 |         . Rebuild a revision that you believe works, install, and verify that. | 
 | 15 |         . Do a binary search over the kernels to figure out which one | 
 | 16 |           introduced the bug.  I.e., suppose 1.3.28 didn't have the bug, but  | 
 | 17 |           you know that 1.3.69 does.  Pick a kernel in the middle and build | 
 | 18 |           that, like 1.3.50.  Build & test; if it works, pick the mid point | 
 | 19 |           between .50 and .69, else the mid point between .28 and .50. | 
 | 20 |         . You'll narrow it down to the kernel that introduced the bug.  You | 
 | 21 |           can probably do better than this but it gets tricky.   | 
 | 22 |  | 
 | 23 |         . Narrow it down to a subdirectory | 
 | 24 |  | 
 | 25 |           - Copy kernel that works into "test".  Let's say that 3.62 works, | 
 | 26 |             but 3.63 doesn't.  So you diff -r those two kernels and come | 
 | 27 |             up with a list of directories that changed.  For each of those | 
 | 28 |             directories: | 
 | 29 |  | 
 | 30 |                 Copy the non-working directory next to the working directory | 
 | 31 |                 as "dir.63".   | 
 | 32 |                 One directory at time, try moving the working directory to | 
 | 33 |                 "dir.62" and mv dir.63 dir"time, try  | 
 | 34 |  | 
 | 35 |                         mv dir dir.62 | 
 | 36 |                         mv dir.63 dir | 
 | 37 |                         find dir -name '*.[oa]' -print | xargs rm -f | 
 | 38 |  | 
 | 39 |                 And then rebuild and retest.  Assuming that all related | 
 | 40 |                 changes were contained in the sub directory, this should  | 
 | 41 |                 isolate the change to a directory.   | 
 | 42 |  | 
 | 43 |                 Problems: changes in header files may have occurred; I've | 
 | 44 |                 found in my case that they were self explanatory - you may  | 
 | 45 |                 or may not want to give up when that happens. | 
 | 46 |  | 
 | 47 |         . Narrow it down to a file | 
 | 48 |  | 
 | 49 |           - You can apply the same technique to each file in the directory, | 
 | 50 |             hoping that the changes in that file are self contained.   | 
 | 51 |              | 
 | 52 |         . Narrow it down to a routine | 
 | 53 |  | 
 | 54 |           - You can take the old file and the new file and manually create | 
 | 55 |             a merged file that has | 
 | 56 |  | 
 | 57 |                 #ifdef VER62 | 
 | 58 |                 routine() | 
 | 59 |                 { | 
 | 60 |                         ... | 
 | 61 |                 } | 
 | 62 |                 #else | 
 | 63 |                 routine() | 
 | 64 |                 { | 
 | 65 |                         ... | 
 | 66 |                 } | 
 | 67 |                 #endif | 
 | 68 |  | 
 | 69 |             And then walk through that file, one routine at a time and | 
 | 70 |             prefix it with | 
 | 71 |  | 
 | 72 |                 #define VER62 | 
 | 73 |                 /* both routines here */ | 
 | 74 |                 #undef VER62 | 
 | 75 |  | 
 | 76 |             Then recompile, retest, move the ifdefs until you find the one | 
 | 77 |             that makes the difference. | 
 | 78 |  | 
 | 79 | Finally, you take all the info that you have, kernel revisions, bug | 
 | 80 | description, the extent to which you have narrowed it down, and pass  | 
 | 81 | that off to whomever you believe is the maintainer of that section. | 
 | 82 | A post to linux.dev.kernel isn't such a bad idea if you've done some | 
 | 83 | work to narrow it down. | 
 | 84 |  | 
 | 85 | If you get it down to a routine, you'll probably get a fix in 24 hours. | 
 | 86 |  | 
 | 87 | My apologies to Linus and the other kernel hackers for describing this | 
 | 88 | brute force approach, it's hardly what a kernel hacker would do.  However, | 
 | 89 | it does work and it lets non-hackers help fix bugs.  And it is cool | 
 | 90 | because Linux snapshots will let you do this - something that you can't | 
 | 91 | do with vendor supplied releases. | 
 | 92 |  |