| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 |  | 
|  | 2 | Debugging on Linux for s/390 & z/Architecture | 
|  | 3 | by | 
|  | 4 | Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) | 
|  | 5 | Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation | 
|  | 6 | Best viewed with fixed width fonts | 
|  | 7 |  | 
|  | 8 | Overview of Document: | 
|  | 9 | ===================== | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 10 | This document is intended to give a good overview of how to debug | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 11 | Linux for s/390 & z/Architecture. It isn't intended as a complete reference & not a | 
|  | 12 | tutorial on the fundamentals of C & assembly. It doesn't go into | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 13 | 390 IO in any detail. It is intended to complement the documents in the | 
|  | 14 | reference section below & any other worthwhile references you get. | 
|  | 15 |  | 
|  | 16 | It is intended like the Enterprise Systems Architecture/390 Reference Summary | 
|  | 17 | to be printed out & used as a quick cheat sheet self help style reference when | 
|  | 18 | problems occur. | 
|  | 19 |  | 
|  | 20 | Contents | 
|  | 21 | ======== | 
|  | 22 | Register Set | 
|  | 23 | Address Spaces on Intel Linux | 
|  | 24 | Address Spaces on Linux for s/390 & z/Architecture | 
|  | 25 | The Linux for s/390 & z/Architecture Kernel Task Structure | 
|  | 26 | Register Usage & Stackframes on Linux for s/390 & z/Architecture | 
|  | 27 | A sample program with comments | 
|  | 28 | Compiling programs for debugging on Linux for s/390 & z/Architecture | 
|  | 29 | Figuring out gcc compile errors | 
|  | 30 | Debugging Tools | 
|  | 31 | objdump | 
|  | 32 | strace | 
|  | 33 | Performance Debugging | 
|  | 34 | Debugging under VM | 
|  | 35 | s/390 & z/Architecture IO Overview | 
|  | 36 | Debugging IO on s/390 & z/Architecture under VM | 
|  | 37 | GDB on s/390 & z/Architecture | 
|  | 38 | Stack chaining in gdb by hand | 
|  | 39 | Examining core dumps | 
|  | 40 | ldd | 
|  | 41 | Debugging modules | 
|  | 42 | The proc file system | 
|  | 43 | Starting points for debugging scripting languages etc. | 
|  | 44 | Dumptool & Lcrash | 
|  | 45 | SysRq | 
|  | 46 | References | 
|  | 47 | Special Thanks | 
|  | 48 |  | 
|  | 49 | Register Set | 
|  | 50 | ============ | 
|  | 51 | The current architectures have the following registers. | 
|  | 52 |  | 
|  | 53 | 16  General propose registers, 32 bit on s/390 64 bit on z/Architecture, r0-r15 or gpr0-gpr15 used for arithmetic & addressing. | 
|  | 54 |  | 
|  | 55 | 16 Control registers, 32 bit on s/390 64 bit on z/Architecture, ( cr0-cr15 kernel usage only ) used for memory management, | 
|  | 56 | interrupt control,debugging control etc. | 
|  | 57 |  | 
|  | 58 | 16 Access registers ( ar0-ar15 ) 32 bit on s/390 & z/Architecture | 
|  | 59 | not used by normal programs but potentially could | 
|  | 60 | be used as temporary storage. Their main purpose is their 1 to 1 | 
|  | 61 | association with general purpose registers and are used in | 
|  | 62 | the kernel for copying data between kernel & user address spaces. | 
|  | 63 | Access register 0 ( & access register 1 on z/Architecture ( needs 64 bit | 
|  | 64 | pointer ) ) is currently used by the pthread library as a pointer to | 
|  | 65 | the current running threads private area. | 
|  | 66 |  | 
|  | 67 | 16 64 bit floating point registers (fp0-fp15 ) IEEE & HFP floating | 
|  | 68 | point format compliant on G5 upwards & a Floating point control reg (FPC) | 
|  | 69 | 4  64 bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines. | 
|  | 70 | Note: | 
|  | 71 | Linux (currently) always uses IEEE & emulates G5 IEEE format on older machines, | 
|  | 72 | ( provided the kernel is configured for this ). | 
|  | 73 |  | 
|  | 74 |  | 
|  | 75 | The PSW is the most important register on the machine it | 
|  | 76 | is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of | 
|  | 77 | a program counter (pc), condition code register,memory space designator. | 
|  | 78 | In IBM standard notation I am counting bit 0 as the MSB. | 
|  | 79 | It has several advantages over a normal program counter | 
|  | 80 | in that you can change address translation & program counter | 
|  | 81 | in a single instruction. To change address translation, | 
|  | 82 | e.g. switching address translation off requires that you | 
|  | 83 | have a logical=physical mapping for the address you are | 
|  | 84 | currently running at. | 
|  | 85 |  | 
|  | 86 | Bit           Value | 
|  | 87 | s/390 z/Architecture | 
|  | 88 | 0       0     Reserved ( must be 0 ) otherwise specification exception occurs. | 
|  | 89 |  | 
|  | 90 | 1       1     Program Event Recording 1 PER enabled, | 
| Matt LaPlante | a2ffd27 | 2006-10-03 22:49:15 +0200 | [diff] [blame] | 91 | PER is used to facilitate debugging e.g. single stepping. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 92 |  | 
|  | 93 | 2-4    2-4    Reserved ( must be 0 ). | 
|  | 94 |  | 
|  | 95 | 5       5     Dynamic address translation 1=DAT on. | 
|  | 96 |  | 
|  | 97 | 6       6     Input/Output interrupt Mask | 
|  | 98 |  | 
|  | 99 | 7       7     External interrupt Mask used primarily for interprocessor signalling & | 
|  | 100 | clock interrupts. | 
|  | 101 |  | 
|  | 102 | 8-11  8-11    PSW Key used for complex memory protection mechanism not used under linux | 
|  | 103 |  | 
|  | 104 | 12      12    1 on s/390 0 on z/Architecture | 
|  | 105 |  | 
|  | 106 | 13      13    Machine Check Mask 1=enable machine check interrupts | 
|  | 107 |  | 
|  | 108 | 14      14    Wait State set this to 1 to stop the processor except for interrupts & give | 
|  | 109 | time to other LPARS used in CPU idle in the kernel to increase overall | 
|  | 110 | usage of processor resources. | 
|  | 111 |  | 
|  | 112 | 15      15    Problem state ( if set to 1 certain instructions are disabled ) | 
|  | 113 | all linux user programs run with this bit 1 | 
|  | 114 | ( useful info for debugging under VM ). | 
|  | 115 |  | 
|  | 116 | 16-17 16-17   Address Space Control | 
|  | 117 |  | 
|  | 118 | 00 Primary Space Mode when DAT on | 
|  | 119 | The linux kernel currently runs in this mode, CR1 is affiliated with | 
|  | 120 | this mode & points to the primary segment table origin etc. | 
|  | 121 |  | 
|  | 122 | 01 Access register mode this mode is used in functions to | 
|  | 123 | copy data between kernel & user space. | 
|  | 124 |  | 
|  | 125 | 10 Secondary space mode not used in linux however CR7 the | 
|  | 126 | register affiliated with this mode is & this & normally | 
|  | 127 | CR13=CR7 to allow us to copy data between kernel & user space. | 
|  | 128 | We do this as follows: | 
|  | 129 | We set ar2 to 0 to designate its | 
|  | 130 | affiliated gpr ( gpr2 )to point to primary=kernel space. | 
|  | 131 | We set ar4 to 1 to designate its | 
|  | 132 | affiliated gpr ( gpr4 ) to point to secondary=home=user space | 
|  | 133 | & then essentially do a memcopy(gpr2,gpr4,size) to | 
|  | 134 | copy data between the address spaces, the reason we use home space for the | 
|  | 135 | kernel & don't keep secondary space free is that code will not run in | 
|  | 136 | secondary space. | 
|  | 137 |  | 
|  | 138 | 11 Home Space Mode all user programs run in this mode. | 
|  | 139 | it is affiliated with CR13. | 
|  | 140 |  | 
|  | 141 | 18-19 18-19   Condition codes (CC) | 
|  | 142 |  | 
|  | 143 | 20    20      Fixed point overflow mask if 1=FPU exceptions for this event | 
|  | 144 | occur ( normally 0 ) | 
|  | 145 |  | 
|  | 146 | 21    21      Decimal overflow mask if 1=FPU exceptions for this event occur | 
|  | 147 | ( normally 0 ) | 
|  | 148 |  | 
|  | 149 | 22    22      Exponent underflow mask if 1=FPU exceptions for this event occur | 
|  | 150 | ( normally 0 ) | 
|  | 151 |  | 
|  | 152 | 23    23      Significance Mask if 1=FPU exceptions for this event occur | 
|  | 153 | ( normally 0 ) | 
|  | 154 |  | 
|  | 155 | 24-31 24-30   Reserved Must be 0. | 
|  | 156 |  | 
|  | 157 | 31      Extended Addressing Mode | 
|  | 158 | 32      Basic Addressing Mode | 
|  | 159 | Used to set addressing mode | 
|  | 160 | PSW 31   PSW 32 | 
|  | 161 | 0         0        24 bit | 
|  | 162 | 0         1        31 bit | 
|  | 163 | 1         1        64 bit | 
|  | 164 |  | 
|  | 165 | 32             1=31 bit addressing mode 0=24 bit addressing mode (for backward | 
| Matt LaPlante | 6c28f2c | 2006-10-03 22:46:31 +0200 | [diff] [blame] | 166 | compatibility), linux always runs with this bit set to 1 | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 167 |  | 
|  | 168 | 33-64          Instruction address. | 
|  | 169 | 33-63    Reserved must be 0 | 
|  | 170 | 64-127   Address | 
|  | 171 | In 24 bits mode bits 64-103=0 bits 104-127 Address | 
|  | 172 | In 31 bits mode bits 64-96=0 bits 97-127 Address | 
|  | 173 | Note: unlike 31 bit mode on s/390 bit 96 must be zero | 
|  | 174 | when loading the address with LPSWE otherwise a | 
|  | 175 | specification exception occurs, LPSW is fully backward | 
|  | 176 | compatible. | 
|  | 177 |  | 
|  | 178 |  | 
|  | 179 | Prefix Page(s) | 
|  | 180 | -------------- | 
|  | 181 | This per cpu memory area is too intimately tied to the processor not to mention. | 
|  | 182 | It exists between the real addresses 0-4096 on s/390 & 0-8192 z/Architecture & is exchanged | 
|  | 183 | with a 1 page on s/390 or 2 pages on z/Architecture in absolute storage by the set | 
|  | 184 | prefix instruction in linux'es startup. | 
|  | 185 | This page is mapped to a different prefix for each processor in an SMP configuration | 
|  | 186 | ( assuming the os designer is sane of course :-) ). | 
|  | 187 | Bytes 0-512 ( 200 hex ) on s/390 & 0-512,4096-4544,4604-5119 currently on z/Architecture | 
|  | 188 | are used by the processor itself for holding such information as exception indications & | 
|  | 189 | entry points for exceptions. | 
|  | 190 | Bytes after 0xc00 hex are used by linux for per processor globals on s/390 & z/Architecture | 
| Matt LaPlante | 3f6dee9 | 2006-10-03 22:45:33 +0200 | [diff] [blame] | 191 | ( there is a gap on z/Architecture too currently between 0xc00 & 1000 which linux uses ). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 192 | The closest thing to this on traditional architectures is the interrupt | 
|  | 193 | vector table. This is a good thing & does simplify some of the kernel coding | 
|  | 194 | however it means that we now cannot catch stray NULL pointers in the | 
|  | 195 | kernel without hard coded checks. | 
|  | 196 |  | 
|  | 197 |  | 
|  | 198 |  | 
|  | 199 | Address Spaces on Intel Linux | 
|  | 200 | ============================= | 
|  | 201 |  | 
|  | 202 | The traditional Intel Linux is approximately mapped as follows forgive | 
|  | 203 | the ascii art. | 
|  | 204 | 0xFFFFFFFF 4GB Himem                        ***************** | 
|  | 205 | *               * | 
|  | 206 | * Kernel Space  * | 
|  | 207 | *               * | 
|  | 208 | *****************          **************** | 
|  | 209 | User Space Himem (typically 0xC0000000 3GB )*  User Stack   *          *              * | 
|  | 210 | *****************          *              * | 
|  | 211 | *  Shared Libs  *          * Next Process * | 
|  | 212 | *****************          *     to       * | 
|  | 213 | *               *    <==   *     Run      *  <== | 
|  | 214 | *  User Program *          *              * | 
|  | 215 | *   Data BSS    *          *              * | 
|  | 216 | *	 Text       *          *              * | 
|  | 217 | *   Sections    *          *              * | 
|  | 218 | 0x00000000         			    *****************          **************** | 
|  | 219 |  | 
|  | 220 | Now it is easy to see that on Intel it is quite easy to recognise a kernel address | 
|  | 221 | as being one greater than user space himem ( in this case 0xC0000000). | 
|  | 222 | & addresses of less than this are the ones in the current running program on this | 
|  | 223 | processor ( if an smp box ). | 
|  | 224 | If using the virtual machine ( VM ) as a debugger it is quite difficult to | 
|  | 225 | know which user process is running as the address space you are looking at | 
|  | 226 | could be from any process in the run queue. | 
|  | 227 |  | 
|  | 228 | The limitation of Intels addressing technique is that the linux | 
|  | 229 | kernel uses a very simple real address to virtual addressing technique | 
|  | 230 | of Real Address=Virtual Address-User Space Himem. | 
|  | 231 | This means that on Intel the kernel linux can typically only address | 
|  | 232 | Himem=0xFFFFFFFF-0xC0000000=1GB & this is all the RAM these machines | 
|  | 233 | can typically use. | 
|  | 234 | They can lower User Himem to 2GB or lower & thus be | 
|  | 235 | able to use 2GB of RAM however this shrinks the maximum size | 
|  | 236 | of User Space from 3GB to 2GB they have a no win limit of 4GB unless | 
|  | 237 | they go to 64 Bit. | 
|  | 238 |  | 
|  | 239 |  | 
|  | 240 | On 390 our limitations & strengths make us slightly different. | 
|  | 241 | For backward compatibility we are only allowed use 31 bits (2GB) | 
| Matt LaPlante | 6c28f2c | 2006-10-03 22:46:31 +0200 | [diff] [blame] | 242 | of our 32 bit addresses, however, we use entirely separate address | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 243 | spaces for the user & kernel. | 
|  | 244 |  | 
|  | 245 | This means we can support 2GB of non Extended RAM on s/390, & more | 
|  | 246 | with the Extended memory management swap device & | 
|  | 247 | currently 4TB of physical memory currently on z/Architecture. | 
|  | 248 |  | 
|  | 249 |  | 
|  | 250 | Address Spaces on Linux for s/390 & z/Architecture | 
|  | 251 | ================================================== | 
|  | 252 |  | 
|  | 253 | Our addressing scheme is as follows | 
|  | 254 |  | 
|  | 255 |  | 
|  | 256 | Himem 0x7fffffff 2GB on s/390    *****************          **************** | 
|  | 257 | currently 0x3ffffffffff (2^42)-1 *  User Stack   *          *              * | 
|  | 258 | on z/Architecture.		 *****************          *              * | 
|  | 259 | *  Shared Libs  *          *              * | 
|  | 260 | *****************          *              * | 
|  | 261 | *               *          *    Kernel    * | 
|  | 262 | *  User Program *          *              * | 
|  | 263 | *   Data BSS    *          *              * | 
|  | 264 | *    Text       *          *              * | 
|  | 265 | *   Sections    *          *              * | 
|  | 266 | 0x00000000                       *****************          **************** | 
|  | 267 |  | 
|  | 268 | This also means that we need to look at the PSW problem state bit | 
|  | 269 | or the addressing mode to decide whether we are looking at | 
|  | 270 | user or kernel space. | 
|  | 271 |  | 
|  | 272 | Virtual Addresses on s/390 & z/Architecture | 
|  | 273 | =========================================== | 
|  | 274 |  | 
|  | 275 | A virtual address on s/390 is made up of 3 parts | 
|  | 276 | The SX ( segment index, roughly corresponding to the PGD & PMD in linux terminology ) | 
|  | 277 | being bits 1-11. | 
|  | 278 | The PX ( page index, corresponding to the page table entry (pte) in linux terminology ) | 
|  | 279 | being bits 12-19. | 
|  | 280 | The remaining bits BX (the byte index are the offset in the page ) | 
|  | 281 | i.e. bits 20 to 31. | 
|  | 282 |  | 
|  | 283 | On z/Architecture in linux we currently make up an address from 4 parts. | 
|  | 284 | The region index bits (RX) 0-32 we currently use bits 22-32 | 
|  | 285 | The segment index (SX) being bits 33-43 | 
|  | 286 | The page index (PX) being bits  44-51 | 
|  | 287 | The byte index (BX) being bits  52-63 | 
|  | 288 |  | 
|  | 289 | Notes: | 
|  | 290 | 1) s/390 has no PMD so the PMD is really the PGD also. | 
|  | 291 | A lot of this stuff is defined in pgtable.h. | 
|  | 292 |  | 
|  | 293 | 2) Also seeing as s/390's page indexes are only 1k  in size | 
|  | 294 | (bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k ) | 
|  | 295 | to make the best use of memory by updating 4 segment indices | 
|  | 296 | entries each time we mess with a PMD & use offsets | 
|  | 297 | 0,1024,2048 & 3072 in this page as for our segment indexes. | 
|  | 298 | On z/Architecture our page indexes are now 2k in size | 
|  | 299 | ( bits 12-19 x 8 bytes per pte ) we do a similar trick | 
|  | 300 | but only mess with 2 segment indices each time we mess with | 
|  | 301 | a PMD. | 
|  | 302 |  | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 303 | 3) As z/Architecture supports up to a massive 5-level page table lookup we | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 304 | can only use 3 currently on Linux ( as this is all the generic kernel | 
|  | 305 | currently supports ) however this may change in future | 
|  | 306 | this allows us to access ( according to my sums ) | 
|  | 307 | 4TB of virtual storage per process i.e. | 
|  | 308 | 4096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes, | 
|  | 309 | enough for another 2 or 3 of years I think :-). | 
|  | 310 | to do this we use a region-third-table designation type in | 
|  | 311 | our address space control registers. | 
|  | 312 |  | 
|  | 313 |  | 
|  | 314 | The Linux for s/390 & z/Architecture Kernel Task Structure | 
|  | 315 | ========================================================== | 
|  | 316 | Each process/thread under Linux for S390 has its own kernel task_struct | 
|  | 317 | defined in linux/include/linux/sched.h | 
|  | 318 | The S390 on initialisation & resuming of a process on a cpu sets | 
|  | 319 | the __LC_KERNEL_STACK variable in the spare prefix area for this cpu | 
| Matt LaPlante | 53cb472 | 2006-10-03 22:55:17 +0200 | [diff] [blame] | 320 | (which we use for per-processor globals). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 321 |  | 
| Matt LaPlante | 53cb472 | 2006-10-03 22:55:17 +0200 | [diff] [blame] | 322 | The kernel stack pointer is intimately tied with the task structure for | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 323 | each processor as follows. | 
|  | 324 |  | 
|  | 325 | s/390 | 
|  | 326 | ************************ | 
|  | 327 | *  1 page kernel stack * | 
|  | 328 | *        ( 4K )        * | 
|  | 329 | ************************ | 
|  | 330 | *   1 page task_struct * | 
|  | 331 | *        ( 4K )        * | 
|  | 332 | 8K aligned  ************************ | 
|  | 333 |  | 
|  | 334 | z/Architecture | 
|  | 335 | ************************ | 
|  | 336 | *  2 page kernel stack * | 
|  | 337 | *        ( 8K )        * | 
|  | 338 | ************************ | 
|  | 339 | *  2 page task_struct  * | 
|  | 340 | *        ( 8K )        * | 
|  | 341 | 16K aligned ************************ | 
|  | 342 |  | 
|  | 343 | What this means is that we don't need to dedicate any register or global variable | 
|  | 344 | to point to the current running process & can retrieve it with the following | 
|  | 345 | very simple construct for s/390 & one very similar for z/Architecture. | 
|  | 346 |  | 
|  | 347 | static inline struct task_struct * get_current(void) | 
|  | 348 | { | 
|  | 349 | struct task_struct *current; | 
|  | 350 | __asm__("lhi   %0,-8192\n\t" | 
|  | 351 | "nr    %0,15" | 
|  | 352 | : "=r" (current) ); | 
|  | 353 | return current; | 
|  | 354 | } | 
|  | 355 |  | 
|  | 356 | i.e. just anding the current kernel stack pointer with the mask -8192. | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 357 | Thankfully because Linux doesn't have support for nested IO interrupts | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 358 | & our devices have large buffers can survive interrupts being shut for | 
|  | 359 | short amounts of time we don't need a separate stack for interrupts. | 
|  | 360 |  | 
|  | 361 |  | 
|  | 362 |  | 
|  | 363 |  | 
|  | 364 | Register Usage & Stackframes on Linux for s/390 & z/Architecture | 
|  | 365 | ================================================================= | 
|  | 366 | Overview: | 
|  | 367 | --------- | 
|  | 368 | This is the code that gcc produces at the top & the bottom of | 
| Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 369 | each function. It usually is fairly consistent & similar from | 
|  | 370 | function to function & if you know its layout you can probably | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 371 | make some headway in finding the ultimate cause of a problem | 
|  | 372 | after a crash without a source level debugger. | 
|  | 373 |  | 
|  | 374 | Note: To follow stackframes requires a knowledge of C or Pascal & | 
|  | 375 | limited knowledge of one assembly language. | 
|  | 376 |  | 
|  | 377 | It should be noted that there are some differences between the | 
|  | 378 | s/390 & z/Architecture stack layouts as the z/Architecture stack layout didn't have | 
|  | 379 | to maintain compatibility with older linkage formats. | 
|  | 380 |  | 
|  | 381 | Glossary: | 
|  | 382 | --------- | 
|  | 383 | alloca: | 
|  | 384 | This is a built in compiler function for runtime allocation | 
|  | 385 | of extra space on the callers stack which is obviously freed | 
|  | 386 | up on function exit ( e.g. the caller may choose to allocate nothing | 
|  | 387 | of a buffer of 4k if required for temporary purposes ), it generates | 
|  | 388 | very efficient code ( a few cycles  ) when compared to alternatives | 
|  | 389 | like malloc. | 
|  | 390 |  | 
|  | 391 | automatics: These are local variables on the stack, | 
|  | 392 | i.e they aren't in registers & they aren't static. | 
|  | 393 |  | 
|  | 394 | back-chain: | 
|  | 395 | This is a pointer to the stack pointer before entering a | 
|  | 396 | framed functions ( see frameless function ) prologue got by | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 397 | dereferencing the address of the current stack pointer, | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 398 | i.e. got by accessing the 32 bit value at the stack pointers | 
|  | 399 | current location. | 
|  | 400 |  | 
|  | 401 | base-pointer: | 
|  | 402 | This is a pointer to the back of the literal pool which | 
|  | 403 | is an area just behind each procedure used to store constants | 
|  | 404 | in each function. | 
|  | 405 |  | 
|  | 406 | call-clobbered: The caller probably needs to save these registers if there | 
|  | 407 | is something of value in them, on the stack or elsewhere before making a | 
|  | 408 | call to another procedure so that it can restore it later. | 
|  | 409 |  | 
|  | 410 | epilogue: | 
|  | 411 | The code generated by the compiler to return to the caller. | 
|  | 412 |  | 
|  | 413 | frameless-function | 
|  | 414 | A frameless function in Linux for s390 & z/Architecture is one which doesn't | 
|  | 415 | need more than the register save area ( 96 bytes on s/390, 160 on z/Architecture ) | 
|  | 416 | given to it by the caller. | 
|  | 417 | A frameless function never: | 
|  | 418 | 1) Sets up a back chain. | 
|  | 419 | 2) Calls alloca. | 
|  | 420 | 3) Calls other normal functions | 
|  | 421 | 4) Has automatics. | 
|  | 422 |  | 
|  | 423 | GOT-pointer: | 
|  | 424 | This is a pointer to the global-offset-table in ELF | 
|  | 425 | ( Executable Linkable Format, Linux'es most common executable format ), | 
|  | 426 | all globals & shared library objects are found using this pointer. | 
|  | 427 |  | 
|  | 428 | lazy-binding | 
|  | 429 | ELF shared libraries are typically only loaded when routines in the shared | 
|  | 430 | library are actually first called at runtime. This is lazy binding. | 
|  | 431 |  | 
|  | 432 | procedure-linkage-table | 
|  | 433 | This is a table found from the GOT which contains pointers to routines | 
|  | 434 | in other shared libraries which can't be called to by easier means. | 
|  | 435 |  | 
|  | 436 | prologue: | 
|  | 437 | The code generated by the compiler to set up the stack frame. | 
|  | 438 |  | 
|  | 439 | outgoing-args: | 
|  | 440 | This is extra area allocated on the stack of the calling function if the | 
|  | 441 | parameters for the callee's cannot all be put in registers, the same | 
|  | 442 | area can be reused by each function the caller calls. | 
|  | 443 |  | 
|  | 444 | routine-descriptor: | 
|  | 445 | A COFF  executable format based concept of a procedure reference | 
|  | 446 | actually being 8 bytes or more as opposed to a simple pointer to the routine. | 
|  | 447 | This is typically defined as follows | 
|  | 448 | Routine Descriptor offset 0=Pointer to Function | 
|  | 449 | Routine Descriptor offset 4=Pointer to Table of Contents | 
|  | 450 | The table of contents/TOC is roughly equivalent to a GOT pointer. | 
|  | 451 | & it means that shared libraries etc. can be shared between several | 
|  | 452 | environments each with their own TOC. | 
|  | 453 |  | 
|  | 454 |  | 
|  | 455 | static-chain: This is used in nested functions a concept adopted from pascal | 
|  | 456 | by gcc not used in ansi C or C++ ( although quite useful ), basically it | 
|  | 457 | is a pointer used to reference local variables of enclosing functions. | 
|  | 458 | You might come across this stuff once or twice in your lifetime. | 
|  | 459 |  | 
|  | 460 | e.g. | 
|  | 461 | The function below should return 11 though gcc may get upset & toss warnings | 
|  | 462 | about unused variables. | 
|  | 463 | int FunctionA(int a) | 
|  | 464 | { | 
|  | 465 | int b; | 
|  | 466 | FunctionC(int c) | 
|  | 467 | { | 
|  | 468 | b=c+1; | 
|  | 469 | } | 
|  | 470 | FunctionC(10); | 
|  | 471 | return(b); | 
|  | 472 | } | 
|  | 473 |  | 
|  | 474 |  | 
|  | 475 | s/390 & z/Architecture Register usage | 
|  | 476 | ===================================== | 
|  | 477 | r0       used by syscalls/assembly                  call-clobbered | 
|  | 478 | r1	 used by syscalls/assembly                  call-clobbered | 
|  | 479 | r2       argument 0 / return value 0                call-clobbered | 
|  | 480 | r3       argument 1 / return value 1 (if long long) call-clobbered | 
|  | 481 | r4       argument 2                                 call-clobbered | 
|  | 482 | r5       argument 3                                 call-clobbered | 
| Heiko Carstens | d8c351a | 2007-02-05 21:17:34 +0100 | [diff] [blame] | 483 | r6	 argument 4				    saved | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 484 | r7       pointer-to arguments 5 to ...              saved | 
|  | 485 | r8       this & that                                saved | 
|  | 486 | r9       this & that                                saved | 
|  | 487 | r10      static-chain ( if nested function )        saved | 
|  | 488 | r11      frame-pointer ( if function used alloca )  saved | 
|  | 489 | r12      got-pointer                                saved | 
|  | 490 | r13      base-pointer                               saved | 
|  | 491 | r14      return-address                             saved | 
|  | 492 | r15      stack-pointer                              saved | 
|  | 493 |  | 
|  | 494 | f0       argument 0 / return value ( float/double ) call-clobbered | 
|  | 495 | f2       argument 1                                 call-clobbered | 
|  | 496 | f4       z/Architecture argument 2                  saved | 
|  | 497 | f6       z/Architecture argument 3                  saved | 
|  | 498 | The remaining floating points | 
|  | 499 | f1,f3,f5 f7-f15 are call-clobbered. | 
|  | 500 |  | 
|  | 501 | Notes: | 
|  | 502 | ------ | 
|  | 503 | 1) The only requirement is that registers which are used | 
|  | 504 | by the callee are saved, e.g. the compiler is perfectly | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 505 | capable of using r11 for purposes other than a frame a | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 506 | frame pointer if a frame pointer is not needed. | 
|  | 507 | 2) In functions with variable arguments e.g. printf the calling procedure | 
|  | 508 | is identical to one without variable arguments & the same number of | 
|  | 509 | parameters. However, the prologue of this function is somewhat more | 
|  | 510 | hairy owing to it having to move these parameters to the stack to | 
|  | 511 | get va_start, va_arg & va_end to work. | 
|  | 512 | 3) Access registers are currently unused by gcc but are used in | 
|  | 513 | the kernel. Possibilities exist to use them at the moment for | 
|  | 514 | temporary storage but it isn't recommended. | 
|  | 515 | 4) Only 4 of the floating point registers are used for | 
|  | 516 | parameter passing as older machines such as G3 only have only 4 | 
|  | 517 | & it keeps the stack frame compatible with other compilers. | 
|  | 518 | However with IEEE floating point emulation under linux on the | 
|  | 519 | older machines you are free to use the other 12. | 
|  | 520 | 5) A long long or double parameter cannot be have the | 
|  | 521 | first 4 bytes in a register & the second four bytes in the | 
|  | 522 | outgoing args area. It must be purely in the outgoing args | 
|  | 523 | area if crossing this boundary. | 
|  | 524 | 6) Floating point parameters are mixed with outgoing args | 
|  | 525 | on the outgoing args area in the order the are passed in as parameters. | 
|  | 526 | 7) Floating point arguments 2 & 3 are saved in the outgoing args area for | 
|  | 527 | z/Architecture | 
|  | 528 |  | 
|  | 529 |  | 
|  | 530 | Stack Frame Layout | 
|  | 531 | ------------------ | 
|  | 532 | s/390     z/Architecture | 
|  | 533 | 0         0             back chain ( a 0 here signifies end of back chain ) | 
|  | 534 | 4         8             eos ( end of stack, not used on Linux for S390 used in other linkage formats ) | 
|  | 535 | 8         16            glue used in other s/390 linkage formats for saved routine descriptors etc. | 
|  | 536 | 12        24            glue used in other s/390 linkage formats for saved routine descriptors etc. | 
|  | 537 | 16        32            scratch area | 
|  | 538 | 20        40            scratch area | 
|  | 539 | 24        48            saved r6 of caller function | 
|  | 540 | 28        56            saved r7 of caller function | 
|  | 541 | 32        64            saved r8 of caller function | 
|  | 542 | 36        72            saved r9 of caller function | 
|  | 543 | 40        80            saved r10 of caller function | 
|  | 544 | 44        88            saved r11 of caller function | 
|  | 545 | 48        96            saved r12 of caller function | 
|  | 546 | 52        104           saved r13 of caller function | 
|  | 547 | 56        112           saved r14 of caller function | 
|  | 548 | 60        120           saved r15 of caller function | 
|  | 549 | 64        128           saved f4 of caller function | 
|  | 550 | 72        132           saved f6 of caller function | 
|  | 551 | 80                      undefined | 
|  | 552 | 96        160           outgoing args passed from caller to callee | 
|  | 553 | 96+x      160+x         possible stack alignment ( 8 bytes desirable ) | 
|  | 554 | 96+x+y    160+x+y       alloca space of caller ( if used ) | 
|  | 555 | 96+x+y+z  160+x+y+z     automatics of caller ( if used ) | 
|  | 556 | 0                       back-chain | 
|  | 557 |  | 
|  | 558 | A sample program with comments. | 
|  | 559 | =============================== | 
|  | 560 |  | 
|  | 561 | Comments on the function test | 
|  | 562 | ----------------------------- | 
|  | 563 | 1) It didn't need to set up a pointer to the constant pool gpr13 as it isn't used | 
|  | 564 | ( :-( ). | 
|  | 565 | 2) This is a frameless function & no stack is bought. | 
|  | 566 | 3) The compiler was clever enough to recognise that it could return the | 
|  | 567 | value in r2 as well as use it for the passed in parameter ( :-) ). | 
|  | 568 | 4) The basr ( branch relative & save ) trick works as follows the instruction | 
|  | 569 | has a special case with r0,r0 with some instruction operands is understood as | 
|  | 570 | the literal value 0, some risc architectures also do this ). So now | 
|  | 571 | we are branching to the next address & the address new program counter is | 
|  | 572 | in r13,so now we subtract the size of the function prologue we have executed | 
|  | 573 | + the size of the literal pool to get to the top of the literal pool | 
|  | 574 | 0040037c int test(int b) | 
|  | 575 | {                                                          # Function prologue below | 
|  | 576 | 40037c:	90 de f0 34 	stm	%r13,%r14,52(%r15) # Save registers r13 & r14 | 
|  | 577 | 400380:	0d d0       	basr	%r13,%r0           # Set up pointer to constant pool using | 
|  | 578 | 400382:	a7 da ff fa 	ahi	%r13,-6            # basr trick | 
|  | 579 | return(5+b); | 
|  | 580 | # Huge main program | 
|  | 581 | 400386:	a7 2a 00 05 	ahi	%r2,5              # add 5 to r2 | 
|  | 582 |  | 
|  | 583 | # Function epilogue below | 
|  | 584 | 40038a:	98 de f0 34 	lm	%r13,%r14,52(%r15) # restore registers r13 & 14 | 
|  | 585 | 40038e:	07 fe       	br	%r14               # return | 
|  | 586 | } | 
|  | 587 |  | 
|  | 588 | Comments on the function main | 
|  | 589 | ----------------------------- | 
|  | 590 | 1) The compiler did this function optimally ( 8-) ) | 
|  | 591 |  | 
|  | 592 | Literal pool for main. | 
|  | 593 | 400390:	ff ff ff ec 	.long 0xffffffec | 
|  | 594 | main(int argc,char *argv[]) | 
|  | 595 | {                                                          # Function prologue below | 
|  | 596 | 400394:	90 bf f0 2c 	stm	%r11,%r15,44(%r15) # Save necessary registers | 
|  | 597 | 400398:	18 0f       	lr	%r0,%r15           # copy stack pointer to r0 | 
|  | 598 | 40039a:	a7 fa ff a0 	ahi	%r15,-96           # Make area for callee saving | 
|  | 599 | 40039e:	0d d0       	basr	%r13,%r0           # Set up r13 to point to | 
|  | 600 | 4003a0:	a7 da ff f0 	ahi	%r13,-16           # literal pool | 
|  | 601 | 4003a4:	50 00 f0 00 	st	%r0,0(%r15)        # Save backchain | 
|  | 602 |  | 
|  | 603 | return(test(5));                                   # Main Program Below | 
|  | 604 | 4003a8:	58 e0 d0 00 	l	%r14,0(%r13)       # load relative address of test from | 
|  | 605 | # literal pool | 
|  | 606 | 4003ac:	a7 28 00 05 	lhi	%r2,5              # Set first parameter to 5 | 
|  | 607 | 4003b0:	4d ee d0 00 	bas	%r14,0(%r14,%r13)  # jump to test setting r14 as return | 
|  | 608 | # address using branch & save instruction. | 
|  | 609 |  | 
|  | 610 | # Function Epilogue below | 
|  | 611 | 4003b4:	98 bf f0 8c 	lm	%r11,%r15,140(%r15)# Restore necessary registers. | 
|  | 612 | 4003b8:	07 fe       	br	%r14               # return to do program exit | 
|  | 613 | } | 
|  | 614 |  | 
|  | 615 |  | 
|  | 616 | Compiler updates | 
|  | 617 | ---------------- | 
|  | 618 |  | 
|  | 619 | main(int argc,char *argv[]) | 
|  | 620 | { | 
|  | 621 | 4004fc:	90 7f f0 1c       	stm	%r7,%r15,28(%r15) | 
|  | 622 | 400500:	a7 d5 00 04       	bras	%r13,400508 <main+0xc> | 
|  | 623 | 400504:	00 40 04 f4       	.long	0x004004f4 | 
|  | 624 | # compiler now puts constant pool in code to so it saves an instruction | 
|  | 625 | 400508:	18 0f             	lr	%r0,%r15 | 
|  | 626 | 40050a:	a7 fa ff a0       	ahi	%r15,-96 | 
|  | 627 | 40050e:	50 00 f0 00       	st	%r0,0(%r15) | 
|  | 628 | return(test(5)); | 
|  | 629 | 400512:	58 10 d0 00       	l	%r1,0(%r13) | 
|  | 630 | 400516:	a7 28 00 05       	lhi	%r2,5 | 
|  | 631 | 40051a:	0d e1             	basr	%r14,%r1 | 
|  | 632 | # compiler adds 1 extra instruction to epilogue this is done to | 
|  | 633 | # avoid processor pipeline stalls owing to data dependencies on g5 & | 
|  | 634 | # above as register 14 in the old code was needed directly after being loaded | 
|  | 635 | # by the lm	%r11,%r15,140(%r15) for the br %14. | 
|  | 636 | 40051c:	58 40 f0 98       	l	%r4,152(%r15) | 
|  | 637 | 400520:	98 7f f0 7c       	lm	%r7,%r15,124(%r15) | 
|  | 638 | 400524:	07 f4             	br	%r4 | 
|  | 639 | } | 
|  | 640 |  | 
|  | 641 |  | 
|  | 642 | Hartmut ( our compiler developer ) also has been threatening to take out the | 
|  | 643 | stack backchain in optimised code as this also causes pipeline stalls, you | 
|  | 644 | have been warned. | 
|  | 645 |  | 
|  | 646 | 64 bit z/Architecture code disassembly | 
|  | 647 | -------------------------------------- | 
|  | 648 |  | 
|  | 649 | If you understand the stuff above you'll understand the stuff | 
|  | 650 | below too so I'll avoid repeating myself & just say that | 
|  | 651 | some of the instructions have g's on the end of them to indicate | 
|  | 652 | they are 64 bit & the stack offsets are a bigger, | 
|  | 653 | the only other difference you'll find between 32 & 64 bit is that | 
|  | 654 | we now use f4 & f6 for floating point arguments on 64 bit. | 
|  | 655 | 00000000800005b0 <test>: | 
|  | 656 | int test(int b) | 
|  | 657 | { | 
|  | 658 | return(5+b); | 
|  | 659 | 800005b0:	a7 2a 00 05       	ahi	%r2,5 | 
|  | 660 | 800005b4:	b9 14 00 22       	lgfr	%r2,%r2 # downcast to integer | 
|  | 661 | 800005b8:	07 fe             	br	%r14 | 
|  | 662 | 800005ba:	07 07             	bcr	0,%r7 | 
|  | 663 |  | 
|  | 664 |  | 
|  | 665 | } | 
|  | 666 |  | 
|  | 667 | 00000000800005bc <main>: | 
|  | 668 | main(int argc,char *argv[]) | 
|  | 669 | { | 
|  | 670 | 800005bc:	eb bf f0 58 00 24 	stmg	%r11,%r15,88(%r15) | 
|  | 671 | 800005c2:	b9 04 00 1f       	lgr	%r1,%r15 | 
|  | 672 | 800005c6:	a7 fb ff 60       	aghi	%r15,-160 | 
|  | 673 | 800005ca:	e3 10 f0 00 00 24 	stg	%r1,0(%r15) | 
|  | 674 | return(test(5)); | 
|  | 675 | 800005d0:	a7 29 00 05       	lghi	%r2,5 | 
|  | 676 | # brasl allows jumps > 64k & is overkill here bras would do fune | 
|  | 677 | 800005d4:	c0 e5 ff ff ff ee 	brasl	%r14,800005b0 <test> | 
|  | 678 | 800005da:	e3 40 f1 10 00 04 	lg	%r4,272(%r15) | 
|  | 679 | 800005e0:	eb bf f0 f8 00 04 	lmg	%r11,%r15,248(%r15) | 
|  | 680 | 800005e6:	07 f4             	br	%r4 | 
|  | 681 | } | 
|  | 682 |  | 
|  | 683 |  | 
|  | 684 |  | 
|  | 685 | Compiling programs for debugging on Linux for s/390 & z/Architecture | 
|  | 686 | ==================================================================== | 
|  | 687 | -gdwarf-2 now works it should be considered the default debugging | 
|  | 688 | format for s/390 & z/Architecture as it is more reliable for debugging | 
|  | 689 | shared libraries,  normal -g debugging works much better now | 
|  | 690 | Thanks to the IBM java compiler developers bug reports. | 
|  | 691 |  | 
|  | 692 | This is typically done adding/appending the flags -g or -gdwarf-2 to the | 
|  | 693 | CFLAGS & LDFLAGS variables Makefile of the program concerned. | 
|  | 694 |  | 
|  | 695 | If using gdb & you would like accurate displays of registers & | 
|  | 696 | stack traces compile without optimisation i.e make sure | 
|  | 697 | that there is no -O2 or similar on the CFLAGS line of the Makefile & | 
|  | 698 | the emitted gcc commands, obviously this will produce worse code | 
|  | 699 | ( not advisable for shipment ) but it is an  aid to the debugging process. | 
|  | 700 |  | 
|  | 701 | This aids debugging because the compiler will copy parameters passed in | 
|  | 702 | in registers onto the stack so backtracing & looking at passed in | 
|  | 703 | parameters will work, however some larger programs which use inline functions | 
|  | 704 | will not compile without optimisation. | 
|  | 705 |  | 
|  | 706 | Debugging with optimisation has since much improved after fixing | 
|  | 707 | some bugs, please make sure you are using gdb-5.0 or later developed | 
|  | 708 | after Nov'2000. | 
|  | 709 |  | 
|  | 710 | Figuring out gcc compile errors | 
|  | 711 | =============================== | 
|  | 712 | If you are getting a lot of syntax errors compiling a program & the problem | 
|  | 713 | isn't blatantly obvious from the source. | 
|  | 714 | It often helps to just preprocess the file, this is done with the -E | 
|  | 715 | option in gcc. | 
|  | 716 | What this does is that it runs through the very first phase of compilation | 
|  | 717 | ( compilation in gcc is done in several stages & gcc calls many programs to | 
|  | 718 | achieve its end result ) with the -E option gcc just calls the gcc preprocessor (cpp). | 
|  | 719 | The c preprocessor does the following, it joins all the files #included together | 
|  | 720 | recursively ( #include files can #include other files ) & also the c file you wish to compile. | 
|  | 721 | It puts a fully qualified path of the #included files in a comment & it | 
|  | 722 | does macro expansion. | 
|  | 723 | This is useful for debugging because | 
|  | 724 | 1) You can double check whether the files you expect to be included are the ones | 
|  | 725 | that are being included ( e.g. double check that you aren't going to the i386 asm directory ). | 
|  | 726 | 2) Check that macro definitions aren't clashing with typedefs, | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 727 | 3) Check that definitions aren't being used before they are being included. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 728 | 4) Helps put the line emitting the error under the microscope if it contains macros. | 
|  | 729 |  | 
|  | 730 | For convenience the Linux kernel's makefile will do preprocessing automatically for you | 
|  | 731 | by suffixing the file you want built with .i ( instead of .o ) | 
|  | 732 |  | 
|  | 733 | e.g. | 
|  | 734 | from the linux directory type | 
|  | 735 | make arch/s390/kernel/signal.i | 
|  | 736 | this will build | 
|  | 737 |  | 
|  | 738 | s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer | 
|  | 739 | -fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce   -E arch/s390/kernel/signal.c | 
|  | 740 | > arch/s390/kernel/signal.i | 
|  | 741 |  | 
|  | 742 | Now look at signal.i you should see something like. | 
|  | 743 |  | 
|  | 744 |  | 
|  | 745 | # 1 "/home1/barrow/linux/include/asm/types.h" 1 | 
|  | 746 | typedef unsigned short umode_t; | 
|  | 747 | typedef __signed__ char __s8; | 
|  | 748 | typedef unsigned char __u8; | 
|  | 749 | typedef __signed__ short __s16; | 
|  | 750 | typedef unsigned short __u16; | 
|  | 751 |  | 
|  | 752 | If instead you are getting errors further down e.g. | 
|  | 753 | unknown instruction:2515 "move.l" or better still unknown instruction:2515 | 
|  | 754 | "Fixme not implemented yet, call Martin" you are probably are attempting to compile some code | 
|  | 755 | meant for another architecture or code that is simply not implemented, with a fixme statement | 
|  | 756 | stuck into the inline assembly code so that the author of the file now knows he has work to do. | 
|  | 757 | To look at the assembly emitted by gcc just before it is about to call gas ( the gnu assembler ) | 
|  | 758 | use the -S option. | 
|  | 759 | Again for your convenience the Linux kernel's Makefile will hold your hand & | 
|  | 760 | do all this donkey work for you also by building the file with the .s suffix. | 
|  | 761 | e.g. | 
|  | 762 | from the Linux directory type | 
|  | 763 | make arch/s390/kernel/signal.s | 
|  | 764 |  | 
|  | 765 | s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer | 
|  | 766 | -fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce  -S arch/s390/kernel/signal.c | 
|  | 767 | -o arch/s390/kernel/signal.s | 
|  | 768 |  | 
|  | 769 |  | 
|  | 770 | This will output something like, ( please note the constant pool & the useful comments | 
|  | 771 | in the prologue to give you a hand at interpreting it ). | 
|  | 772 |  | 
|  | 773 | .LC54: | 
|  | 774 | .string	"misaligned (__u16 *) in __xchg\n" | 
|  | 775 | .LC57: | 
|  | 776 | .string	"misaligned (__u32 *) in __xchg\n" | 
|  | 777 | .L$PG1: # Pool sys_sigsuspend | 
|  | 778 | .LC192: | 
|  | 779 | .long	-262401 | 
|  | 780 | .LC193: | 
|  | 781 | .long	-1 | 
|  | 782 | .LC194: | 
|  | 783 | .long	schedule-.L$PG1 | 
|  | 784 | .LC195: | 
|  | 785 | .long	do_signal-.L$PG1 | 
|  | 786 | .align 4 | 
|  | 787 | .globl sys_sigsuspend | 
|  | 788 | .type	 sys_sigsuspend,@function | 
|  | 789 | sys_sigsuspend: | 
|  | 790 | #	leaf function           0 | 
|  | 791 | #	automatics              16 | 
|  | 792 | #	outgoing args           0 | 
|  | 793 | #	need frame pointer      0 | 
|  | 794 | #	call alloca             0 | 
|  | 795 | #	has varargs             0 | 
|  | 796 | #	incoming args (stack)   0 | 
|  | 797 | #	function length         168 | 
|  | 798 | STM	8,15,32(15) | 
|  | 799 | LR	0,15 | 
|  | 800 | AHI	15,-112 | 
|  | 801 | BASR	13,0 | 
|  | 802 | .L$CO1:	AHI	13,.L$PG1-.L$CO1 | 
|  | 803 | ST	0,0(15) | 
|  | 804 | LR    8,2 | 
|  | 805 | N     5,.LC192-.L$PG1(13) | 
|  | 806 |  | 
|  | 807 | Adding -g to the above output makes the output even more useful | 
|  | 808 | e.g. typing | 
|  | 809 | make CC:="s390-gcc -g" kernel/sched.s | 
|  | 810 |  | 
|  | 811 | which compiles. | 
|  | 812 | s390-gcc -g -D__KERNEL__ -I/home/barrow/linux-2.3/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -fno-strength-reduce   -S kernel/sched.c -o kernel/sched.s | 
|  | 813 |  | 
|  | 814 | also outputs stabs ( debugger ) info, from this info you can find out the | 
|  | 815 | offsets & sizes of various elements in structures. | 
|  | 816 | e.g. the stab for the structure | 
|  | 817 | struct rlimit { | 
|  | 818 | unsigned long	rlim_cur; | 
|  | 819 | unsigned long	rlim_max; | 
|  | 820 | }; | 
|  | 821 | is | 
|  | 822 | .stabs "rlimit:T(151,2)=s8rlim_cur:(0,5),0,32;rlim_max:(0,5),32,32;;",128,0,0,0 | 
|  | 823 | from this stab you can see that | 
|  | 824 | rlimit_cur starts at bit offset 0 & is 32 bits in size | 
|  | 825 | rlimit_max starts at bit offset 32 & is 32 bits in size. | 
|  | 826 |  | 
|  | 827 |  | 
|  | 828 | Debugging Tools: | 
|  | 829 | ================ | 
|  | 830 |  | 
|  | 831 | objdump | 
|  | 832 | ======= | 
|  | 833 | This is a tool with many options the most useful being ( if compiled with -g). | 
|  | 834 | objdump --source <victim program or object file> > <victims debug listing > | 
|  | 835 |  | 
|  | 836 |  | 
|  | 837 | The whole kernel can be compiled like this ( Doing this will make a 17MB kernel | 
|  | 838 | & a 200 MB listing ) however you have to strip it before building the image | 
|  | 839 | using the strip command to make it a more reasonable size to boot it. | 
|  | 840 |  | 
|  | 841 | A source/assembly mixed dump of the kernel can be done with the line | 
|  | 842 | objdump --source vmlinux > vmlinux.lst | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 843 | Also, if the file isn't compiled -g, this will output as much debugging information | 
|  | 844 | as it can (e.g. function names). This is very slow as it spends lots | 
|  | 845 | of time searching for debugging info. The following self explanatory line should be used | 
|  | 846 | instead if the code isn't compiled -g, as it is much faster: | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 847 | objdump --disassemble-all --syms vmlinux > vmlinux.lst | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 848 |  | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 849 | As hard drive space is valuable most of us use the following approach. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 850 | 1) Look at the emitted psw on the console to find the crash address in the kernel. | 
|  | 851 | 2) Look at the file System.map ( in the linux directory ) produced when building | 
|  | 852 | the kernel to find the closest address less than the current PSW to find the | 
|  | 853 | offending function. | 
|  | 854 | 3) use grep or similar to search the source tree looking for the source file | 
|  | 855 | with this function if you don't know where it is. | 
|  | 856 | 4) rebuild this object file with -g on, as an example suppose the file was | 
|  | 857 | ( /arch/s390/kernel/signal.o ) | 
|  | 858 | 5) Assuming the file with the erroneous function is signal.c Move to the base of the | 
|  | 859 | Linux source tree. | 
|  | 860 | 6) rm /arch/s390/kernel/signal.o | 
|  | 861 | 7) make /arch/s390/kernel/signal.o | 
|  | 862 | 8) watch the gcc command line emitted | 
| Matt LaPlante | 3f6dee9 | 2006-10-03 22:45:33 +0200 | [diff] [blame] | 863 | 9) type it in again or alternatively cut & paste it on the console adding the -g option. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 864 | 10) objdump --source arch/s390/kernel/signal.o > signal.lst | 
|  | 865 | This will output the source & the assembly intermixed, as the snippet below shows | 
|  | 866 | This will unfortunately output addresses which aren't the same | 
|  | 867 | as the kernel ones you should be able to get around the mental arithmetic | 
|  | 868 | by playing with the --adjust-vma parameter to objdump. | 
|  | 869 |  | 
|  | 870 |  | 
|  | 871 |  | 
|  | 872 |  | 
| Adrian Bunk | 4448aaf | 2005-11-08 21:34:42 -0800 | [diff] [blame] | 873 | static inline void spin_lock(spinlock_t *lp) | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 874 | { | 
|  | 875 | a0:       18 34           lr      %r3,%r4 | 
|  | 876 | a2:       a7 3a 03 bc     ahi     %r3,956 | 
|  | 877 | __asm__ __volatile("    lhi   1,-1\n" | 
|  | 878 | a6:       a7 18 ff ff     lhi     %r1,-1 | 
|  | 879 | aa:       1f 00           slr     %r0,%r0 | 
|  | 880 | ac:       ba 01 30 00     cs      %r0,%r1,0(%r3) | 
|  | 881 | b0:       a7 44 ff fd     jm      aa <sys_sigsuspend+0x2e> | 
|  | 882 | saveset = current->blocked; | 
|  | 883 | b4:       d2 07 f0 68     mvc     104(8,%r15),972(%r4) | 
|  | 884 | b8:       43 cc | 
|  | 885 | return (set->sig[0] & mask) != 0; | 
|  | 886 | } | 
|  | 887 |  | 
|  | 888 | 6) If debugging under VM go down to that section in the document for more info. | 
|  | 889 |  | 
|  | 890 |  | 
|  | 891 | I now have a tool which takes the pain out of --adjust-vma | 
|  | 892 | & you are able to do something like | 
|  | 893 | make /arch/s390/kernel/traps.lst | 
|  | 894 | & it automatically generates the correctly relocated entries for | 
|  | 895 | the text segment in traps.lst. | 
|  | 896 | This tool is now standard in linux distro's in scripts/makelst | 
|  | 897 |  | 
|  | 898 | strace: | 
|  | 899 | ------- | 
|  | 900 | Q. What is it ? | 
|  | 901 | A. It is a tool for intercepting calls to the kernel & logging them | 
|  | 902 | to a file & on the screen. | 
|  | 903 |  | 
|  | 904 | Q. What use is it ? | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 905 | A. You can use it to find out what files a particular program opens. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 906 |  | 
|  | 907 |  | 
|  | 908 |  | 
|  | 909 | Example 1 | 
|  | 910 | --------- | 
|  | 911 | If you wanted to know does ping work but didn't have the source | 
|  | 912 | strace ping -c 1 127.0.0.1 | 
|  | 913 | & then look at the man pages for each of the syscalls below, | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 914 | ( In fact this is sometimes easier than looking at some spaghetti | 
| Matt LaPlante | 2fe0ae7 | 2006-10-03 22:50:39 +0200 | [diff] [blame] | 915 | source which conditionally compiles for several architectures ). | 
|  | 916 | Not everything that it throws out needs to make sense immediately. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 917 |  | 
|  | 918 | Just looking quickly you can see that it is making up a RAW socket | 
|  | 919 | for the ICMP protocol. | 
|  | 920 | Doing an alarm(10) for a 10 second timeout | 
|  | 921 | & doing a gettimeofday call before & after each read to see | 
|  | 922 | how long the replies took, & writing some text to stdout so the user | 
|  | 923 | has an idea what is going on. | 
|  | 924 |  | 
|  | 925 | socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3 | 
|  | 926 | getuid()                                = 0 | 
|  | 927 | setuid(0)                               = 0 | 
|  | 928 | stat("/usr/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory) | 
|  | 929 | stat("/usr/share/locale/libc/C", 0xbffff134) = -1 ENOENT (No such file or directory) | 
|  | 930 | stat("/usr/local/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory) | 
|  | 931 | getpid()                                = 353 | 
|  | 932 | setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 | 
|  | 933 | setsockopt(3, SOL_SOCKET, SO_RCVBUF, [49152], 4) = 0 | 
|  | 934 | fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(3, 1), ...}) = 0 | 
|  | 935 | mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40008000 | 
|  | 936 | ioctl(1, TCGETS, {B9600 opost isig icanon echo ...}) = 0 | 
|  | 937 | write(1, "PING 127.0.0.1 (127.0.0.1): 56 d"..., 42PING 127.0.0.1 (127.0.0.1): 56 data bytes | 
|  | 938 | ) = 42 | 
|  | 939 | sigaction(SIGINT, {0x8049ba0, [], SA_RESTART}, {SIG_DFL}) = 0 | 
|  | 940 | sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {SIG_DFL}) = 0 | 
|  | 941 | gettimeofday({948904719, 138951}, NULL) = 0 | 
|  | 942 | sendto(3, "\10\0D\201a\1\0\0\17#\2178\307\36"..., 64, 0, {sin_family=AF_INET, | 
|  | 943 | sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64 | 
|  | 944 | sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0 | 
|  | 945 | sigaction(SIGALRM, {0x8049ba0, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0 | 
|  | 946 | alarm(10)                               = 0 | 
|  | 947 | recvfrom(3, "E\0\0T\0005\0\0@\1|r\177\0\0\1\177"..., 192, 0, | 
|  | 948 | {sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84 | 
|  | 949 | gettimeofday({948904719, 160224}, NULL) = 0 | 
|  | 950 | recvfrom(3, "E\0\0T\0006\0\0\377\1\275p\177\0"..., 192, 0, | 
|  | 951 | {sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84 | 
|  | 952 | gettimeofday({948904719, 166952}, NULL) = 0 | 
|  | 953 | write(1, "64 bytes from 127.0.0.1: icmp_se"..., | 
|  | 954 | 5764 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=28.0 ms | 
|  | 955 |  | 
|  | 956 | Example 2 | 
|  | 957 | --------- | 
|  | 958 | strace passwd 2>&1 | grep open | 
|  | 959 | produces the following output | 
|  | 960 | open("/etc/ld.so.cache", O_RDONLY)      = 3 | 
|  | 961 | open("/opt/kde/lib/libc.so.5", O_RDONLY) = -1 ENOENT (No such file or directory) | 
|  | 962 | open("/lib/libc.so.5", O_RDONLY)        = 3 | 
|  | 963 | open("/dev", O_RDONLY)                  = 3 | 
|  | 964 | open("/var/run/utmp", O_RDONLY)         = 3 | 
|  | 965 | open("/etc/passwd", O_RDONLY)           = 3 | 
|  | 966 | open("/etc/shadow", O_RDONLY)           = 3 | 
|  | 967 | open("/etc/login.defs", O_RDONLY)       = 4 | 
|  | 968 | open("/dev/tty", O_RDONLY)              = 4 | 
|  | 969 |  | 
|  | 970 | The 2>&1 is done to redirect stderr to stdout & grep is then filtering this input | 
|  | 971 | through the pipe for each line containing the string open. | 
|  | 972 |  | 
|  | 973 |  | 
|  | 974 | Example 3 | 
|  | 975 | --------- | 
| Matt LaPlante | 53cb472 | 2006-10-03 22:55:17 +0200 | [diff] [blame] | 976 | Getting sophisticated | 
|  | 977 | telnetd crashes & I don't know why | 
|  | 978 |  | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 979 | Steps | 
|  | 980 | ----- | 
|  | 981 | 1) Replace the following line in /etc/inetd.conf | 
|  | 982 | telnet  stream  tcp     nowait  root    /usr/sbin/in.telnetd -h | 
|  | 983 | with | 
|  | 984 | telnet  stream  tcp     nowait  root    /blah | 
|  | 985 |  | 
|  | 986 | 2) Create the file /blah with the following contents to start tracing telnetd | 
|  | 987 | #!/bin/bash | 
|  | 988 | /usr/bin/strace -o/t1 -f /usr/sbin/in.telnetd -h | 
|  | 989 | 3) chmod 700 /blah to make it executable only to root | 
|  | 990 | 4) | 
|  | 991 | killall -HUP inetd | 
|  | 992 | or ps aux | grep inetd | 
|  | 993 | get inetd's process id | 
|  | 994 | & kill -HUP inetd to restart it. | 
|  | 995 |  | 
|  | 996 | Important options | 
|  | 997 | ----------------- | 
|  | 998 | -o is used to tell strace to output to a file in our case t1 in the root directory | 
|  | 999 | -f is to follow children i.e. | 
|  | 1000 | e.g in our case above telnetd will start the login process & subsequently a shell like bash. | 
|  | 1001 | You will be able to tell which is which from the process ID's listed on the left hand side | 
|  | 1002 | of the strace output. | 
|  | 1003 | -p<pid> will tell strace to attach to a running process, yup this can be done provided | 
|  | 1004 | it isn't being traced or debugged already & you have enough privileges, | 
|  | 1005 | the reason 2 processes cannot trace or debug the same program is that strace | 
|  | 1006 | becomes the parent process of the one being debugged & processes ( unlike people ) | 
|  | 1007 | can have only one parent. | 
|  | 1008 |  | 
|  | 1009 |  | 
|  | 1010 | However the file /t1 will get big quite quickly | 
|  | 1011 | to test it telnet 127.0.0.1 | 
|  | 1012 |  | 
|  | 1013 | now look at what files in.telnetd execve'd | 
|  | 1014 | 413   execve("/usr/sbin/in.telnetd", ["/usr/sbin/in.telnetd", "-h"], [/* 17 vars */]) = 0 | 
|  | 1015 | 414   execve("/bin/login", ["/bin/login", "-h", "localhost", "-p"], [/* 2 vars */]) = 0 | 
|  | 1016 |  | 
|  | 1017 | Whey it worked!. | 
|  | 1018 |  | 
|  | 1019 |  | 
|  | 1020 | Other hints: | 
|  | 1021 | ------------ | 
|  | 1022 | If the program is not very interactive ( i.e. not much keyboard input ) | 
|  | 1023 | & is crashing in one architecture but not in another you can do | 
|  | 1024 | an strace of both programs under as identical a scenario as you can | 
|  | 1025 | on both architectures outputting to a file then. | 
|  | 1026 | do a diff of the two traces using the diff program | 
|  | 1027 | i.e. | 
|  | 1028 | diff output1 output2 | 
|  | 1029 | & maybe you'll be able to see where the call paths differed, this | 
|  | 1030 | is possibly near the cause of the crash. | 
|  | 1031 |  | 
|  | 1032 | More info | 
|  | 1033 | --------- | 
|  | 1034 | Look at man pages for strace & the various syscalls | 
|  | 1035 | e.g. man strace, man alarm, man socket. | 
|  | 1036 |  | 
|  | 1037 |  | 
|  | 1038 | Performance Debugging | 
|  | 1039 | ===================== | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1040 | gcc is capable of compiling in profiling code just add the -p option | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1041 | to the CFLAGS, this obviously affects program size & performance. | 
|  | 1042 | This can be used by the gprof gnu profiling tool or the | 
|  | 1043 | gcov the gnu code coverage tool ( code coverage is a means of testing | 
|  | 1044 | code quality by checking if all the code in an executable in exercised by | 
|  | 1045 | a tester ). | 
|  | 1046 |  | 
|  | 1047 |  | 
|  | 1048 | Using top to find out where processes are sleeping in the kernel | 
|  | 1049 | ---------------------------------------------------------------- | 
|  | 1050 | To do this copy the System.map from the root directory where | 
|  | 1051 | the linux kernel was built to the /boot directory on your | 
|  | 1052 | linux machine. | 
|  | 1053 | Start top | 
|  | 1054 | Now type fU<return> | 
|  | 1055 | You should see a new field called WCHAN which | 
|  | 1056 | tells you where each process is sleeping here is a typical output. | 
|  | 1057 |  | 
|  | 1058 | 6:59pm  up 41 min,  1 user,  load average: 0.00, 0.00, 0.00 | 
|  | 1059 | 28 processes: 27 sleeping, 1 running, 0 zombie, 0 stopped | 
|  | 1060 | CPU states:  0.0% user,  0.1% system,  0.0% nice, 99.8% idle | 
|  | 1061 | Mem:   254900K av,   45976K used,  208924K free,       0K shrd,   28636K buff | 
|  | 1062 | Swap:       0K av,       0K used,       0K free                    8620K cached | 
|  | 1063 |  | 
|  | 1064 | PID USER     PRI  NI  SIZE  RSS SHARE WCHAN     STAT  LIB %CPU %MEM   TIME COMMAND | 
|  | 1065 | 750 root      12   0   848  848   700 do_select S       0  0.1  0.3   0:00 in.telnetd | 
|  | 1066 | 767 root      16   0  1140 1140   964           R       0  0.1  0.4   0:00 top | 
|  | 1067 | 1 root       8   0   212  212   180 do_select S       0  0.0  0.0   0:00 init | 
|  | 1068 | 2 root       9   0     0    0     0 down_inte SW      0  0.0  0.0   0:00 kmcheck | 
|  | 1069 |  | 
|  | 1070 | The time command | 
|  | 1071 | ---------------- | 
|  | 1072 | Another related command is the time command which gives you an indication | 
|  | 1073 | of where a process is spending the majority of its time. | 
|  | 1074 | e.g. | 
|  | 1075 | time ping -c 5 nc | 
|  | 1076 | outputs | 
|  | 1077 | real	0m4.054s | 
|  | 1078 | user	0m0.010s | 
|  | 1079 | sys	0m0.010s | 
|  | 1080 |  | 
|  | 1081 | Debugging under VM | 
|  | 1082 | ================== | 
|  | 1083 |  | 
|  | 1084 | Notes | 
|  | 1085 | ----- | 
|  | 1086 | Addresses & values in the VM debugger are always hex never decimal | 
|  | 1087 | Address ranges are of the format <HexValue1>-<HexValue2> or <HexValue1>.<HexValue2> | 
| Paolo Ornati | 670e9f3 | 2006-10-03 22:57:56 +0200 | [diff] [blame] | 1088 | e.g. The address range  0x2000 to 0x3000 can be described as 2000-3000 or 2000.1000 | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1089 |  | 
|  | 1090 | The VM Debugger is case insensitive. | 
|  | 1091 |  | 
|  | 1092 | VM's strengths are usually other debuggers weaknesses you can get at any resource | 
|  | 1093 | no matter how sensitive e.g. memory management resources,change address translation | 
|  | 1094 | in the PSW. For kernel hacking you will reap dividends if you get good at it. | 
|  | 1095 |  | 
|  | 1096 | The VM Debugger displays operators but not operands, probably because some | 
|  | 1097 | of it was written when memory was expensive & the programmer was probably proud that | 
|  | 1098 | it fitted into 2k of memory & the programmers & didn't want to shock hardcore VM'ers by | 
|  | 1099 | changing the interface :-), also the debugger displays useful information on the same line & | 
|  | 1100 | the author of the code probably felt that it was a good idea not to go over | 
|  | 1101 | the 80 columns on the screen. | 
|  | 1102 |  | 
|  | 1103 | As some of you are probably in a panic now this isn't as unintuitive as it may seem | 
|  | 1104 | as the 390 instructions are easy to decode mentally & you can make a good guess at a lot | 
|  | 1105 | of them as all the operands are nibble ( half byte aligned ) & if you have an objdump listing | 
|  | 1106 | also it is quite easy to follow, if you don't have an objdump listing keep a copy of | 
|  | 1107 | the s/390 Reference Summary & look at between pages 2 & 7 or alternatively the | 
|  | 1108 | s/390 principles of operation. | 
|  | 1109 | e.g. even I can guess that | 
|  | 1110 | 0001AFF8' LR    180F        CC 0 | 
|  | 1111 | is a ( load register ) lr r0,r15 | 
|  | 1112 |  | 
|  | 1113 | Also it is very easy to tell the length of a 390 instruction from the 2 most significant | 
|  | 1114 | bits in the instruction ( not that this info is really useful except if you are trying to | 
|  | 1115 | make sense of a hexdump of code ). | 
|  | 1116 | Here is a table | 
|  | 1117 | Bits                    Instruction Length | 
|  | 1118 | ------------------------------------------ | 
|  | 1119 | 00                          2 Bytes | 
|  | 1120 | 01                          4 Bytes | 
|  | 1121 | 10                          4 Bytes | 
|  | 1122 | 11                          6 Bytes | 
|  | 1123 |  | 
|  | 1124 |  | 
|  | 1125 |  | 
|  | 1126 |  | 
|  | 1127 | The debugger also displays other useful info on the same line such as the | 
|  | 1128 | addresses being operated on destination addresses of branches & condition codes. | 
|  | 1129 | e.g. | 
|  | 1130 | 00019736' AHI   A7DAFF0E    CC 1 | 
|  | 1131 | 000198BA' BRC   A7840004 -> 000198C2'   CC 0 | 
|  | 1132 | 000198CE' STM   900EF068 >> 0FA95E78    CC 2 | 
|  | 1133 |  | 
|  | 1134 |  | 
|  | 1135 |  | 
|  | 1136 | Useful VM debugger commands | 
|  | 1137 | --------------------------- | 
|  | 1138 |  | 
|  | 1139 | I suppose I'd better mention this before I start | 
|  | 1140 | to list the current active traces do | 
|  | 1141 | Q TR | 
|  | 1142 | there can be a maximum of 255 of these per set | 
|  | 1143 | ( more about trace sets later ). | 
|  | 1144 | To stop traces issue a | 
|  | 1145 | TR END. | 
|  | 1146 | To delete a particular breakpoint issue | 
|  | 1147 | TR DEL <breakpoint number> | 
|  | 1148 |  | 
|  | 1149 | The PA1 key drops to CP mode so you can issue debugger commands, | 
|  | 1150 | Doing alt c (on my 3270 console at least ) clears the screen. | 
|  | 1151 | hitting b <enter> comes back to the running operating system | 
|  | 1152 | from cp mode ( in our case linux ). | 
|  | 1153 | It is typically useful to add shortcuts to your profile.exec file | 
|  | 1154 | if you have one ( this is roughly equivalent to autoexec.bat in DOS ). | 
|  | 1155 | file here are a few from mine. | 
|  | 1156 | /* this gives me command history on issuing f12 */ | 
|  | 1157 | set pf12 retrieve | 
|  | 1158 | /* this continues */ | 
|  | 1159 | set pf8 imm b | 
|  | 1160 | /* goes to trace set a */ | 
|  | 1161 | set pf1 imm tr goto a | 
|  | 1162 | /* goes to trace set b */ | 
|  | 1163 | set pf2 imm tr goto b | 
|  | 1164 | /* goes to trace set c */ | 
|  | 1165 | set pf3 imm tr goto c | 
|  | 1166 |  | 
|  | 1167 |  | 
|  | 1168 |  | 
|  | 1169 | Instruction Tracing | 
|  | 1170 | ------------------- | 
|  | 1171 | Setting a simple breakpoint | 
|  | 1172 | TR I PSWA <address> | 
|  | 1173 | To debug a particular function try | 
|  | 1174 | TR I R <function address range> | 
|  | 1175 | TR I on its own will single step. | 
|  | 1176 | TR I DATA <MNEMONIC> <OPTIONAL RANGE> will trace for particular mnemonics | 
|  | 1177 | e.g. | 
|  | 1178 | TR I DATA 4D R 0197BC.4000 | 
|  | 1179 | will trace for BAS'es ( opcode 4D ) in the range 0197BC.4000 | 
|  | 1180 | if you were inclined you could add traces for all branch instructions & | 
|  | 1181 | suffix them with the run prefix so you would have a backtrace on screen | 
|  | 1182 | when a program crashes. | 
|  | 1183 | TR BR <INTO OR FROM> will trace branches into or out of an address. | 
|  | 1184 | e.g. | 
|  | 1185 | TR BR INTO 0 is often quite useful if a program is getting awkward & deciding | 
|  | 1186 | to branch to 0 & crashing as this will stop at the address before in jumps to 0. | 
|  | 1187 | TR I R <address range> RUN cmd d g | 
|  | 1188 | single steps a range of addresses but stays running & | 
|  | 1189 | displays the gprs on each step. | 
|  | 1190 |  | 
|  | 1191 |  | 
|  | 1192 |  | 
|  | 1193 | Displaying & modifying Registers | 
|  | 1194 | -------------------------------- | 
|  | 1195 | D G will display all the gprs | 
|  | 1196 | Adding a extra G to all the commands is necessary to access the full 64 bit | 
|  | 1197 | content in VM on z/Architecture obviously this isn't required for access registers | 
|  | 1198 | as these are still 32 bit. | 
|  | 1199 | e.g. DGG instead of DG | 
|  | 1200 | D X will display all the control registers | 
|  | 1201 | D AR will display all the access registers | 
|  | 1202 | D AR4-7 will display access registers 4 to 7 | 
|  | 1203 | CPU ALL D G will display the GRPS of all CPUS in the configuration | 
|  | 1204 | D PSW will display the current PSW | 
|  | 1205 | st PSW 2000 will put the value 2000 into the PSW & | 
|  | 1206 | cause crash your machine. | 
|  | 1207 | D PREFIX displays the prefix offset | 
|  | 1208 |  | 
|  | 1209 |  | 
|  | 1210 | Displaying Memory | 
|  | 1211 | ----------------- | 
|  | 1212 | To display memory mapped using the current PSW's mapping try | 
|  | 1213 | D <range> | 
|  | 1214 | To make VM display a message each time it hits a particular address & continue try | 
|  | 1215 | D I<range> will disassemble/display a range of instructions. | 
|  | 1216 | ST addr 32 bit word will store a 32 bit aligned address | 
|  | 1217 | D T<range> will display the EBCDIC in an address ( if you are that way inclined ) | 
|  | 1218 | D R<range> will display real addresses ( without DAT ) but with prefixing. | 
|  | 1219 | There are other complex options to display if you need to get at say home space | 
|  | 1220 | but are in primary space the easiest thing to do is to temporarily | 
|  | 1221 | modify the PSW to the other addressing mode, display the stuff & then | 
|  | 1222 | restore it. | 
|  | 1223 |  | 
|  | 1224 |  | 
|  | 1225 |  | 
|  | 1226 | Hints | 
|  | 1227 | ----- | 
|  | 1228 | If you want to issue a debugger command without halting your virtual machine with the | 
|  | 1229 | PA1 key try prefixing the command with #CP e.g. | 
|  | 1230 | #cp tr i pswa 2000 | 
|  | 1231 | also suffixing most debugger commands with RUN will cause them not | 
|  | 1232 | to stop just display the mnemonic at the current instruction on the console. | 
|  | 1233 | If you have several breakpoints you want to put into your program & | 
|  | 1234 | you get fed up of cross referencing with System.map | 
|  | 1235 | you can do the following trick for several symbols. | 
|  | 1236 | grep do_signal System.map | 
|  | 1237 | which emits the following among other things | 
|  | 1238 | 0001f4e0 T do_signal | 
|  | 1239 | now you can do | 
|  | 1240 |  | 
|  | 1241 | TR I PSWA 0001f4e0 cmd msg * do_signal | 
|  | 1242 | This sends a message to your own console each time do_signal is entered. | 
|  | 1243 | ( As an aside I wrote a perl script once which automatically generated a REXX | 
|  | 1244 | script with breakpoints on every kernel procedure, this isn't a good idea | 
|  | 1245 | because there are thousands of these routines & VM can only set 255 breakpoints | 
|  | 1246 | at a time so you nearly had to spend as long pruning the file down as you would | 
|  | 1247 | entering the msg's by hand ),however, the trick might be useful for a single object file. | 
|  | 1248 | On linux'es 3270 emulator x3270 there is a very useful option under the file ment | 
|  | 1249 | Save Screens In File this is very good of keeping a copy of traces. | 
|  | 1250 |  | 
|  | 1251 | From CMS help <command name> will give you online help on a particular command. | 
|  | 1252 | e.g. | 
|  | 1253 | HELP DISPLAY | 
|  | 1254 |  | 
|  | 1255 | Also CP has a file called profile.exec which automatically gets called | 
|  | 1256 | on startup of CMS ( like autoexec.bat ), keeping on a DOS analogy session | 
|  | 1257 | CP has a feature similar to doskey, it may be useful for you to | 
|  | 1258 | use profile.exec to define some keystrokes. | 
|  | 1259 | e.g. | 
|  | 1260 | SET PF9 IMM B | 
|  | 1261 | This does a single step in VM on pressing F8. | 
|  | 1262 | SET PF10  ^ | 
|  | 1263 | This sets up the ^ key. | 
|  | 1264 | which can be used for ^c (ctrl-c),^z (ctrl-z) which can't be typed directly into some 3270 consoles. | 
|  | 1265 | SET PF11 ^- | 
|  | 1266 | This types the starting keystrokes for a sysrq see SysRq below. | 
|  | 1267 | SET PF12 RETRIEVE | 
|  | 1268 | This retrieves command history on pressing F12. | 
|  | 1269 |  | 
|  | 1270 |  | 
|  | 1271 | Sometimes in VM the display is set up to scroll automatically this | 
|  | 1272 | can be very annoying if there are messages you wish to look at | 
|  | 1273 | to stop this do | 
|  | 1274 | TERM MORE 255 255 | 
|  | 1275 | This will nearly stop automatic screen updates, however it will | 
|  | 1276 | cause a denial of service if lots of messages go to the 3270 console, | 
|  | 1277 | so it would be foolish to use this as the default on a production machine. | 
|  | 1278 |  | 
|  | 1279 |  | 
|  | 1280 | Tracing particular processes | 
|  | 1281 | ---------------------------- | 
|  | 1282 | The kernel's text segment is intentionally at an address in memory that it will | 
|  | 1283 | very seldom collide with text segments of user programs ( thanks Martin ), | 
|  | 1284 | this simplifies debugging the kernel. | 
|  | 1285 | However it is quite common for user processes to have addresses which collide | 
|  | 1286 | this can make debugging a particular process under VM painful under normal | 
|  | 1287 | circumstances as the process may change when doing a | 
|  | 1288 | TR I R <address range>. | 
|  | 1289 | Thankfully after reading VM's online help I figured out how to debug | 
|  | 1290 | I particular process. | 
|  | 1291 |  | 
|  | 1292 | Your first problem is to find the STD ( segment table designation ) | 
|  | 1293 | of the program you wish to debug. | 
|  | 1294 | There are several ways you can do this here are a few | 
|  | 1295 | 1) objdump --syms <program to be debugged> | grep main | 
|  | 1296 | To get the address of main in the program. | 
|  | 1297 | tr i pswa <address of main> | 
|  | 1298 | Start the program, if VM drops to CP on what looks like the entry | 
|  | 1299 | point of the main function this is most likely the process you wish to debug. | 
|  | 1300 | Now do a D X13 or D XG13 on z/Architecture. | 
|  | 1301 | On 31 bit the STD is bits 1-19 ( the STO segment table origin ) | 
|  | 1302 | & 25-31 ( the STL segment table length ) of CR13. | 
|  | 1303 | now type | 
|  | 1304 | TR I R STD <CR13's value> 0.7fffffff | 
|  | 1305 | e.g. | 
|  | 1306 | TR I R STD 8F32E1FF 0.7fffffff | 
|  | 1307 | Another very useful variation is | 
|  | 1308 | TR STORE INTO STD <CR13's value> <address range> | 
|  | 1309 | for finding out when a particular variable changes. | 
|  | 1310 |  | 
|  | 1311 | An alternative way of finding the STD of a currently running process | 
|  | 1312 | is to do the following, ( this method is more complex but | 
| Matt LaPlante | 6c28f2c | 2006-10-03 22:46:31 +0200 | [diff] [blame] | 1313 | could be quite convenient if you aren't updating the kernel much & | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1314 | so your kernel structures will stay constant for a reasonable period of | 
|  | 1315 | time ). | 
|  | 1316 |  | 
|  | 1317 | grep task /proc/<pid>/status | 
|  | 1318 | from this you should see something like | 
|  | 1319 | task: 0f160000 ksp: 0f161de8 pt_regs: 0f161f68 | 
|  | 1320 | This now gives you a pointer to the task structure. | 
|  | 1321 | Now make CC:="s390-gcc -g" kernel/sched.s | 
|  | 1322 | To get the task_struct stabinfo. | 
|  | 1323 | ( task_struct is defined in include/linux/sched.h ). | 
|  | 1324 | Now we want to look at | 
|  | 1325 | task->active_mm->pgd | 
|  | 1326 | on my machine the active_mm in the task structure stab is | 
|  | 1327 | active_mm:(4,12),672,32 | 
|  | 1328 | its offset is 672/8=84=0x54 | 
|  | 1329 | the pgd member in the mm_struct stab is | 
|  | 1330 | pgd:(4,6)=*(29,5),96,32 | 
|  | 1331 | so its offset is 96/8=12=0xc | 
|  | 1332 |  | 
|  | 1333 | so we'll | 
|  | 1334 | hexdump -s 0xf160054 /dev/mem | more | 
|  | 1335 | i.e. task_struct+active_mm offset | 
|  | 1336 | to look at the active_mm member | 
|  | 1337 | f160054 0fee cc60 0019 e334 0000 0000 0000 0011 | 
|  | 1338 | hexdump -s 0x0feecc6c /dev/mem | more | 
|  | 1339 | i.e. active_mm+pgd offset | 
|  | 1340 | feecc6c 0f2c 0000 0000 0001 0000 0001 0000 0010 | 
|  | 1341 | we get something like | 
|  | 1342 | now do | 
|  | 1343 | TR I R STD <pgd|0x7f> 0.7fffffff | 
|  | 1344 | i.e. the 0x7f is added because the pgd only | 
|  | 1345 | gives the page table origin & we need to set the low bits | 
|  | 1346 | to the maximum possible segment table length. | 
|  | 1347 | TR I R STD 0f2c007f 0.7fffffff | 
|  | 1348 | on z/Architecture you'll probably need to do | 
|  | 1349 | TR I R STD <pgd|0x7> 0.ffffffffffffffff | 
|  | 1350 | to set the TableType to 0x1 & the Table length to 3. | 
|  | 1351 |  | 
|  | 1352 |  | 
|  | 1353 |  | 
|  | 1354 | Tracing Program Exceptions | 
|  | 1355 | -------------------------- | 
|  | 1356 | If you get a crash which says something like | 
|  | 1357 | illegal operation or specification exception followed by a register dump | 
|  | 1358 | You can restart linux & trace these using the tr prog <range or value> trace option. | 
|  | 1359 |  | 
|  | 1360 |  | 
|  | 1361 |  | 
|  | 1362 | The most common ones you will normally be tracing for is | 
|  | 1363 | 1=operation exception | 
|  | 1364 | 2=privileged operation exception | 
|  | 1365 | 4=protection exception | 
|  | 1366 | 5=addressing exception | 
|  | 1367 | 6=specification exception | 
|  | 1368 | 10=segment translation exception | 
|  | 1369 | 11=page translation exception | 
|  | 1370 |  | 
|  | 1371 | The full list of these is on page 22 of the current s/390 Reference Summary. | 
|  | 1372 | e.g. | 
|  | 1373 | tr prog 10 will trace segment translation exceptions. | 
|  | 1374 | tr prog on its own will trace all program interruption codes. | 
|  | 1375 |  | 
|  | 1376 | Trace Sets | 
|  | 1377 | ---------- | 
|  | 1378 | On starting VM you are initially in the INITIAL trace set. | 
|  | 1379 | You can do a Q TR to verify this. | 
|  | 1380 | If you have a complex tracing situation where you wish to wait for instance | 
|  | 1381 | till a driver is open before you start tracing IO, but know in your | 
|  | 1382 | heart that you are going to have to make several runs through the code till you | 
|  | 1383 | have a clue whats going on. | 
|  | 1384 |  | 
|  | 1385 | What you can do is | 
|  | 1386 | TR I PSWA <Driver open address> | 
|  | 1387 | hit b to continue till breakpoint | 
|  | 1388 | reach the breakpoint | 
|  | 1389 | now do your | 
|  | 1390 | TR GOTO B | 
|  | 1391 | TR IO 7c08-7c09 inst int run | 
|  | 1392 | or whatever the IO channels you wish to trace are & hit b | 
|  | 1393 |  | 
|  | 1394 | To got back to the initial trace set do | 
|  | 1395 | TR GOTO INITIAL | 
|  | 1396 | & the TR I PSWA <Driver open address> will be the only active breakpoint again. | 
|  | 1397 |  | 
|  | 1398 |  | 
|  | 1399 | Tracing linux syscalls under VM | 
|  | 1400 | ------------------------------- | 
|  | 1401 | Syscalls are implemented on Linux for S390 by the Supervisor call instruction (SVC) there 256 | 
|  | 1402 | possibilities of these as the instruction is made up of a  0xA opcode & the second byte being | 
|  | 1403 | the syscall number. They are traced using the simple command. | 
|  | 1404 | TR SVC  <Optional value or range> | 
| Randy Dunlap | 58cc855 | 2009-01-06 14:42:42 -0800 | [diff] [blame] | 1405 | the syscalls are defined in linux/arch/s390/include/asm/unistd.h | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1406 | e.g. to trace all file opens just do | 
|  | 1407 | TR SVC 5 ( as this is the syscall number of open ) | 
|  | 1408 |  | 
|  | 1409 |  | 
|  | 1410 | SMP Specific commands | 
|  | 1411 | --------------------- | 
|  | 1412 | To find out how many cpus you have | 
|  | 1413 | Q CPUS displays all the CPU's available to your virtual machine | 
|  | 1414 | To find the cpu that the current cpu VM debugger commands are being directed at do | 
| Paolo Ornati | 670e9f3 | 2006-10-03 22:57:56 +0200 | [diff] [blame] | 1415 | Q CPU to change the current cpu VM debugger commands are being directed at do | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1416 | CPU <desired cpu no> | 
|  | 1417 |  | 
|  | 1418 | On a SMP guest issue a command to all CPUs try prefixing the command with cpu all. | 
|  | 1419 | To issue a command to a particular cpu try cpu <cpu number> e.g. | 
|  | 1420 | CPU 01 TR I R 2000.3000 | 
|  | 1421 | If you are running on a guest with several cpus & you have a IO related problem | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1422 | & cannot follow the flow of code but you know it isn't smp related. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1423 | from the bash prompt issue | 
|  | 1424 | shutdown -h now or halt. | 
|  | 1425 | do a Q CPUS to find out how many cpus you have | 
|  | 1426 | detach each one of them from cp except cpu 0 | 
|  | 1427 | by issuing a | 
|  | 1428 | DETACH CPU 01-(number of cpus in configuration) | 
|  | 1429 | & boot linux again. | 
|  | 1430 | TR SIGP will trace inter processor signal processor instructions. | 
|  | 1431 | DEFINE CPU 01-(number in configuration) | 
|  | 1432 | will get your guests cpus back. | 
|  | 1433 |  | 
|  | 1434 |  | 
|  | 1435 | Help for displaying ascii textstrings | 
|  | 1436 | ------------------------------------- | 
|  | 1437 | On the very latest VM Nucleus'es VM can now display ascii | 
|  | 1438 | ( thanks Neale for the hint ) by doing | 
|  | 1439 | D TX<lowaddr>.<len> | 
|  | 1440 | e.g. | 
|  | 1441 | D TX0.100 | 
|  | 1442 |  | 
|  | 1443 | Alternatively | 
|  | 1444 | ============= | 
|  | 1445 | Under older VM debuggers ( I love EBDIC too ) you can use this little program I wrote which | 
|  | 1446 | will convert a command line of hex digits to ascii text which can be compiled under linux & | 
|  | 1447 | you can copy the hex digits from your x3270 terminal to your xterm if you are debugging | 
|  | 1448 | from a linuxbox. | 
|  | 1449 |  | 
|  | 1450 | This is quite useful when looking at a parameter passed in as a text string | 
|  | 1451 | under VM ( unless you are good at decoding ASCII in your head ). | 
|  | 1452 |  | 
|  | 1453 | e.g. consider tracing an open syscall | 
|  | 1454 | TR SVC 5 | 
|  | 1455 | We have stopped at a breakpoint | 
|  | 1456 | 000151B0' SVC   0A05     -> 0001909A'   CC 0 | 
|  | 1457 |  | 
|  | 1458 | D 20.8 to check the SVC old psw in the prefix area & see was it from userspace | 
|  | 1459 | ( for the layout of the prefix area consult P18 of the s/390 390 Reference Summary | 
|  | 1460 | if you have it available ). | 
|  | 1461 | V00000020  070C2000 800151B2 | 
|  | 1462 | The problem state bit wasn't set &  it's also too early in the boot sequence | 
|  | 1463 | for it to be a userspace SVC if it was we would have to temporarily switch the | 
|  | 1464 | psw to user space addressing so we could get at the first parameter of the open in | 
|  | 1465 | gpr2. | 
|  | 1466 | Next do a | 
|  | 1467 | D G2 | 
|  | 1468 | GPR  2 =  00014CB4 | 
|  | 1469 | Now display what gpr2 is pointing to | 
|  | 1470 | D 00014CB4.20 | 
|  | 1471 | V00014CB4  2F646576 2F636F6E 736F6C65 00001BF5 | 
|  | 1472 | V00014CC4  FC00014C B4001001 E0001000 B8070707 | 
|  | 1473 | Now copy the text till the first 00 hex ( which is the end of the string | 
|  | 1474 | to an xterm & do hex2ascii on it. | 
|  | 1475 | hex2ascii 2F646576 2F636F6E 736F6C65 00 | 
|  | 1476 | outputs | 
|  | 1477 | Decoded Hex:=/ d e v / c o n s o l e 0x00 | 
|  | 1478 | We were opening the console device, | 
|  | 1479 |  | 
|  | 1480 | You can compile the code below yourself for practice :-), | 
|  | 1481 | /* | 
|  | 1482 | *    hex2ascii.c | 
|  | 1483 | *    a useful little tool for converting a hexadecimal command line to ascii | 
|  | 1484 | * | 
|  | 1485 | *    Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) | 
|  | 1486 | *    (C) 2000 IBM Deutschland Entwicklung GmbH, IBM Corporation. | 
|  | 1487 | */ | 
|  | 1488 | #include <stdio.h> | 
|  | 1489 |  | 
|  | 1490 | int main(int argc,char *argv[]) | 
|  | 1491 | { | 
|  | 1492 | int cnt1,cnt2,len,toggle=0; | 
|  | 1493 | int startcnt=1; | 
|  | 1494 | unsigned char c,hex; | 
|  | 1495 |  | 
|  | 1496 | if(argc>1&&(strcmp(argv[1],"-a")==0)) | 
|  | 1497 | startcnt=2; | 
|  | 1498 | printf("Decoded Hex:="); | 
|  | 1499 | for(cnt1=startcnt;cnt1<argc;cnt1++) | 
|  | 1500 | { | 
|  | 1501 | len=strlen(argv[cnt1]); | 
|  | 1502 | for(cnt2=0;cnt2<len;cnt2++) | 
|  | 1503 | { | 
|  | 1504 | c=argv[cnt1][cnt2]; | 
|  | 1505 | if(c>='0'&&c<='9') | 
|  | 1506 | c=c-'0'; | 
|  | 1507 | if(c>='A'&&c<='F') | 
|  | 1508 | c=c-'A'+10; | 
|  | 1509 | if(c>='a'&&c<='f') | 
|  | 1510 | c=c-'a'+10; | 
|  | 1511 | switch(toggle) | 
|  | 1512 | { | 
|  | 1513 | case 0: | 
|  | 1514 | hex=c<<4; | 
|  | 1515 | toggle=1; | 
|  | 1516 | break; | 
|  | 1517 | case 1: | 
|  | 1518 | hex+=c; | 
|  | 1519 | if(hex<32||hex>127) | 
|  | 1520 | { | 
|  | 1521 | if(startcnt==1) | 
|  | 1522 | printf("0x%02X ",(int)hex); | 
|  | 1523 | else | 
|  | 1524 | printf("."); | 
|  | 1525 | } | 
|  | 1526 | else | 
|  | 1527 | { | 
|  | 1528 | printf("%c",hex); | 
|  | 1529 | if(startcnt==1) | 
|  | 1530 | printf(" "); | 
|  | 1531 | } | 
|  | 1532 | toggle=0; | 
|  | 1533 | break; | 
|  | 1534 | } | 
|  | 1535 | } | 
|  | 1536 | } | 
|  | 1537 | printf("\n"); | 
|  | 1538 | } | 
|  | 1539 |  | 
|  | 1540 |  | 
|  | 1541 |  | 
|  | 1542 |  | 
|  | 1543 | Stack tracing under VM | 
|  | 1544 | ---------------------- | 
|  | 1545 | A basic backtrace | 
|  | 1546 | ----------------- | 
|  | 1547 |  | 
|  | 1548 | Here are the tricks I use 9 out of 10 times it works pretty well, | 
|  | 1549 |  | 
|  | 1550 | When your backchain reaches a dead end | 
|  | 1551 | -------------------------------------- | 
|  | 1552 | This can happen when an exception happens in the kernel & the kernel is entered twice | 
|  | 1553 | if you reach the NULL pointer at the end of the back chain you should be | 
|  | 1554 | able to sniff further back if you follow the following tricks. | 
|  | 1555 | 1) A kernel address should be easy to recognise since it is in | 
|  | 1556 | primary space & the problem state bit isn't set & also | 
|  | 1557 | The Hi bit of the address is set. | 
|  | 1558 | 2) Another backchain should also be easy to recognise since it is an | 
|  | 1559 | address pointing to another address approximately 100 bytes or 0x70 hex | 
|  | 1560 | behind the current stackpointer. | 
|  | 1561 |  | 
|  | 1562 |  | 
|  | 1563 | Here is some practice. | 
|  | 1564 | boot the kernel & hit PA1 at some random time | 
|  | 1565 | d g to display the gprs, this should display something like | 
|  | 1566 | GPR  0 =  00000001  00156018  0014359C  00000000 | 
|  | 1567 | GPR  4 =  00000001  001B8888  000003E0  00000000 | 
|  | 1568 | GPR  8 =  00100080  00100084  00000000  000FE000 | 
|  | 1569 | GPR 12 =  00010400  8001B2DC  8001B36A  000FFED8 | 
|  | 1570 | Note that GPR14 is a return address but as we are real men we are going to | 
|  | 1571 | trace the stack. | 
|  | 1572 | display 0x40 bytes after the stack pointer. | 
|  | 1573 |  | 
|  | 1574 | V000FFED8  000FFF38 8001B838 80014C8E 000FFF38 | 
|  | 1575 | V000FFEE8  00000000 00000000 000003E0 00000000 | 
|  | 1576 | V000FFEF8  00100080 00100084 00000000 000FE000 | 
|  | 1577 | V000FFF08  00010400 8001B2DC 8001B36A 000FFED8 | 
|  | 1578 |  | 
|  | 1579 |  | 
|  | 1580 | Ah now look at whats in sp+56 (sp+0x38) this is 8001B36A our saved r14 if | 
|  | 1581 | you look above at our stackframe & also agrees with GPR14. | 
|  | 1582 |  | 
|  | 1583 | now backchain | 
|  | 1584 | d 000FFF38.40 | 
|  | 1585 | we now are taking the contents of SP to get our first backchain. | 
|  | 1586 |  | 
|  | 1587 | V000FFF38  000FFFA0 00000000 00014995 00147094 | 
|  | 1588 | V000FFF48  00147090 001470A0 000003E0 00000000 | 
|  | 1589 | V000FFF58  00100080 00100084 00000000 001BF1D0 | 
|  | 1590 | V000FFF68  00010400 800149BA 80014CA6 000FFF38 | 
|  | 1591 |  | 
|  | 1592 | This displays a 2nd return address of 80014CA6 | 
|  | 1593 |  | 
|  | 1594 | now do d 000FFFA0.40 for our 3rd backchain | 
|  | 1595 |  | 
|  | 1596 | V000FFFA0  04B52002 0001107F 00000000 00000000 | 
|  | 1597 | V000FFFB0  00000000 00000000 FF000000 0001107F | 
|  | 1598 | V000FFFC0  00000000 00000000 00000000 00000000 | 
|  | 1599 | V000FFFD0  00010400 80010802 8001085A 000FFFA0 | 
|  | 1600 |  | 
|  | 1601 |  | 
|  | 1602 | our 3rd return address is 8001085A | 
|  | 1603 |  | 
|  | 1604 | as the 04B52002 looks suspiciously like rubbish it is fair to assume that the kernel entry routines | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1605 | for the sake of optimisation don't set up a backchain. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1606 |  | 
|  | 1607 | now look at System.map to see if the addresses make any sense. | 
|  | 1608 |  | 
|  | 1609 | grep -i 0001b3 System.map | 
|  | 1610 | outputs among other things | 
|  | 1611 | 0001b304 T cpu_idle | 
|  | 1612 | so 8001B36A | 
|  | 1613 | is cpu_idle+0x66 ( quiet the cpu is asleep, don't wake it ) | 
|  | 1614 |  | 
|  | 1615 |  | 
|  | 1616 | grep -i 00014 System.map | 
|  | 1617 | produces among other things | 
|  | 1618 | 00014a78 T start_kernel | 
|  | 1619 | so 0014CA6 is start_kernel+some hex number I can't add in my head. | 
|  | 1620 |  | 
|  | 1621 | grep -i 00108 System.map | 
|  | 1622 | this produces | 
|  | 1623 | 00010800 T _stext | 
|  | 1624 | so   8001085A is _stext+0x5a | 
|  | 1625 |  | 
|  | 1626 | Congrats you've done your first backchain. | 
|  | 1627 |  | 
|  | 1628 |  | 
|  | 1629 |  | 
|  | 1630 | s/390 & z/Architecture IO Overview | 
|  | 1631 | ================================== | 
|  | 1632 |  | 
|  | 1633 | I am not going to give a course in 390 IO architecture as this would take me quite a | 
|  | 1634 | while & I'm no expert. Instead I'll give a 390 IO architecture summary for Dummies if you have | 
|  | 1635 | the s/390 principles of operation available read this instead. If nothing else you may find a few | 
|  | 1636 | useful keywords in here & be able to use them on a web search engine like altavista to find | 
|  | 1637 | more useful information. | 
|  | 1638 |  | 
|  | 1639 | Unlike other bus architectures modern 390 systems do their IO using mostly | 
|  | 1640 | fibre optics & devices such as tapes & disks can be shared between several mainframes, | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1641 | also S390 can support up to 65536 devices while a high end PC based system might be choking | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1642 | with around 64. Here is some of the common IO terminology | 
|  | 1643 |  | 
|  | 1644 | Subchannel: | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1645 | This is the logical number most IO commands use to talk to an IO device there can be up to | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1646 | 0x10000 (65536) of these in a configuration typically there is a few hundred. Under VM | 
|  | 1647 | for simplicity they are allocated contiguously, however on the native hardware they are not | 
|  | 1648 | they typically stay consistent between boots provided no new hardware is inserted or removed. | 
|  | 1649 | Under Linux for 390 we use these as IRQ's & also when issuing an IO command (CLEAR SUBCHANNEL, | 
|  | 1650 | HALT SUBCHANNEL,MODIFY SUBCHANNEL,RESUME SUBCHANNEL,START SUBCHANNEL,STORE SUBCHANNEL & | 
|  | 1651 | TEST SUBCHANNEL ) we use this as the ID of the device we wish to talk to, the most | 
|  | 1652 | important of these instructions are START SUBCHANNEL ( to start IO ), TEST SUBCHANNEL ( to check | 
|  | 1653 | whether the IO completed successfully ), & HALT SUBCHANNEL ( to kill IO ), a subchannel | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1654 | can have up to 8 channel paths to a device this offers redundancy if one is not available. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1655 |  | 
|  | 1656 |  | 
|  | 1657 | Device Number: | 
|  | 1658 | This number remains static & Is closely tied to the hardware, there are 65536 of these | 
|  | 1659 | also they are made up of a CHPID ( Channel Path ID, the most significant 8 bits ) | 
|  | 1660 | & another lsb 8 bits. These remain static even if more devices are inserted or removed | 
|  | 1661 | from the hardware, there is a 1 to 1 mapping between Subchannels & Device Numbers provided | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1662 | devices aren't inserted or removed. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1663 |  | 
|  | 1664 | Channel Control Words: | 
|  | 1665 | CCWS are linked lists of instructions initially pointed to by an operation request block (ORB), | 
|  | 1666 | which is initially given to Start Subchannel (SSCH) command along with the subchannel number | 
|  | 1667 | for the IO subsystem to process while the CPU continues executing normal code. | 
|  | 1668 | These come in two flavours, Format 0 ( 24 bit for backward ) | 
|  | 1669 | compatibility & Format 1 ( 31 bit ). These are typically used to issue read & write | 
|  | 1670 | ( & many other instructions ) they consist of a length field & an absolute address field. | 
|  | 1671 | For each IO typically get 1 or 2 interrupts one for channel end ( primary status ) when the | 
|  | 1672 | channel is idle & the second for device end ( secondary status ) sometimes you get both | 
|  | 1673 | concurrently, you check how the IO went on by issuing a TEST SUBCHANNEL at each interrupt, | 
|  | 1674 | from which you receive an Interruption response block (IRB). If you get channel & device end | 
|  | 1675 | status in the IRB without channel checks etc. your IO probably went okay. If you didn't you | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 1676 | probably need a doctor to examine the IRB & extended status word etc. | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1677 | If an error occurs, more sophisticated control units have a facility known as | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1678 | concurrent sense this means that if an error occurs Extended sense information will | 
|  | 1679 | be presented in the Extended status word in the IRB if not you have to issue a | 
|  | 1680 | subsequent SENSE CCW command after the test subchannel. | 
|  | 1681 |  | 
|  | 1682 |  | 
|  | 1683 | TPI( Test pending interrupt) can also be used for polled IO but in multitasking multiprocessor | 
|  | 1684 | systems it isn't recommended except for checking special cases ( i.e. non looping checks for | 
|  | 1685 | pending IO etc. ). | 
|  | 1686 |  | 
|  | 1687 | Store Subchannel & Modify Subchannel can be used to examine & modify operating characteristics | 
|  | 1688 | of a subchannel ( e.g. channel paths ). | 
|  | 1689 |  | 
|  | 1690 | Other IO related Terms: | 
|  | 1691 | Sysplex: S390's Clustering Technology | 
|  | 1692 | QDIO: S390's new high speed IO architecture to support devices such as gigabit ethernet, | 
|  | 1693 | this architecture is also designed to be forward compatible with up & coming 64 bit machines. | 
|  | 1694 |  | 
|  | 1695 |  | 
|  | 1696 | General Concepts | 
|  | 1697 |  | 
|  | 1698 | Input Output Processors (IOP's) are responsible for communicating between | 
|  | 1699 | the mainframe CPU's & the channel & relieve the mainframe CPU's from the | 
|  | 1700 | burden of communicating with IO devices directly, this allows the CPU's to | 
|  | 1701 | concentrate on data processing. | 
|  | 1702 |  | 
|  | 1703 | IOP's can use one or more links ( known as channel paths ) to talk to each | 
|  | 1704 | IO device. It first checks for path availability & chooses an available one, | 
|  | 1705 | then starts ( & sometimes terminates IO ). | 
| Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 1706 | There are two types of channel path: ESCON & the Parallel IO interface. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1707 |  | 
|  | 1708 | IO devices are attached to control units, control units provide the | 
|  | 1709 | logic to interface the channel paths & channel path IO protocols to | 
|  | 1710 | the IO devices, they can be integrated with the devices or housed separately | 
|  | 1711 | & often talk to several similar devices ( typical examples would be raid | 
|  | 1712 | controllers or a control unit which connects to 1000 3270 terminals ). | 
|  | 1713 |  | 
|  | 1714 |  | 
|  | 1715 | +---------------------------------------------------------------+ | 
|  | 1716 | | +-----+ +-----+ +-----+ +-----+  +----------+  +----------+   | | 
|  | 1717 | | | CPU | | CPU | | CPU | | CPU |  |  Main    |  | Expanded |   | | 
|  | 1718 | | |     | |     | |     | |     |  |  Memory  |  |  Storage |   | | 
|  | 1719 | | +-----+ +-----+ +-----+ +-----+  +----------+  +----------+   | | 
|  | 1720 | |---------------------------------------------------------------+ | 
|  | 1721 | |   IOP        |      IOP      |       IOP                      | | 
|  | 1722 | |--------------------------------------------------------------- | 
|  | 1723 | | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | | 
|  | 1724 | ---------------------------------------------------------------- | 
|  | 1725 | ||                                              || | 
|  | 1726 | ||  Bus & Tag Channel Path                      || ESCON | 
|  | 1727 | ||  ======================                      || Channel | 
|  | 1728 | ||  ||                  ||                      || Path | 
|  | 1729 | +----------+               +----------+         +----------+ | 
|  | 1730 | |          |               |          |         |          | | 
|  | 1731 | |    CU    |               |    CU    |         |    CU    | | 
|  | 1732 | |          |               |          |         |          | | 
|  | 1733 | +----------+               +----------+         +----------+ | 
|  | 1734 | |      |                     |                |       | | 
|  | 1735 | +----------+ +----------+      +----------+   +----------+ +----------+ | 
|  | 1736 | |I/O Device| |I/O Device|      |I/O Device|   |I/O Device| |I/O Device| | 
|  | 1737 | +----------+ +----------+      +----------+   +----------+ +----------+ | 
|  | 1738 | CPU = Central Processing Unit | 
|  | 1739 | C = Channel | 
|  | 1740 | IOP = IP Processor | 
|  | 1741 | CU = Control Unit | 
|  | 1742 |  | 
|  | 1743 | The 390 IO systems come in 2 flavours the current 390 machines support both | 
|  | 1744 |  | 
| Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 1745 | The Older 360 & 370 Interface,sometimes called the Parallel I/O interface, | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1746 | sometimes called Bus-and Tag & sometimes Original Equipment Manufacturers | 
|  | 1747 | Interface (OEMI). | 
|  | 1748 |  | 
| Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 1749 | This byte wide Parallel channel path/bus has parity & data on the "Bus" cable | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1750 | & control lines on the "Tag" cable. These can operate in byte multiplex mode for | 
|  | 1751 | sharing between several slow devices or burst mode & monopolize the channel for the | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1752 | whole burst. Up to 256 devices can be addressed  on one of these cables. These cables are | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1753 | about one inch in diameter. The maximum unextended length supported by these cables is | 
|  | 1754 | 125 Meters but this can be extended up to 2km with a fibre optic channel extended | 
|  | 1755 | such as a 3044. The maximum burst speed supported is 4.5 megabytes per second however | 
|  | 1756 | some really old processors support only transfer rates of 3.0, 2.0 & 1.0 MB/sec. | 
|  | 1757 | One of these paths can be daisy chained to up to 8 control units. | 
|  | 1758 |  | 
|  | 1759 |  | 
|  | 1760 | ESCON if fibre optic it is also called FICON | 
|  | 1761 | Was introduced by IBM in 1990. Has 2 fibre optic cables & uses either leds or lasers | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1762 | for communication at a signaling rate of up to 200 megabits/sec. As 10bits are transferred | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1763 | for every 8 bits info this drops to 160 megabits/sec & to 18.6 Megabytes/sec once | 
|  | 1764 | control info & CRC are added. ESCON only operates in burst mode. | 
|  | 1765 |  | 
|  | 1766 | ESCONs typical max cable length is 3km for the led version & 20km for the laser version | 
|  | 1767 | known as XDF ( extended distance facility ). This can be further extended by using an | 
|  | 1768 | ESCON director which triples the above mentioned ranges. Unlike Bus & Tag as ESCON is | 
|  | 1769 | serial it uses a packet switching architecture the standard Bus & Tag control protocol | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 1770 | is however present within the packets. Up to 256 devices can be attached to each control | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1771 | unit that uses one of these interfaces. | 
|  | 1772 |  | 
|  | 1773 | Common 390 Devices include: | 
|  | 1774 | Network adapters typically OSA2,3172's,2116's & OSA-E gigabit ethernet adapters, | 
|  | 1775 | Consoles 3270 & 3215 ( a teletype emulated under linux for a line mode console ). | 
|  | 1776 | DASD's direct access storage devices ( otherwise known as hard disks ). | 
|  | 1777 | Tape Drives. | 
|  | 1778 | CTC ( Channel to Channel Adapters ), | 
| Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 1779 | ESCON or Parallel Cables used as a very high speed serial link | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1780 | between 2 machines. We use 2 cables under linux to do a bi-directional serial link. | 
|  | 1781 |  | 
|  | 1782 |  | 
|  | 1783 | Debugging IO on s/390 & z/Architecture under VM | 
|  | 1784 | =============================================== | 
|  | 1785 |  | 
|  | 1786 | Now we are ready to go on with IO tracing commands under VM | 
|  | 1787 |  | 
|  | 1788 | A few self explanatory queries: | 
|  | 1789 | Q OSA | 
|  | 1790 | Q CTC | 
|  | 1791 | Q DISK ( This command is CMS specific ) | 
|  | 1792 | Q DASD | 
|  | 1793 |  | 
|  | 1794 |  | 
|  | 1795 |  | 
|  | 1796 |  | 
|  | 1797 |  | 
|  | 1798 |  | 
|  | 1799 | Q OSA on my machine returns | 
|  | 1800 | OSA  7C08 ON OSA   7C08 SUBCHANNEL = 0000 | 
|  | 1801 | OSA  7C09 ON OSA   7C09 SUBCHANNEL = 0001 | 
|  | 1802 | OSA  7C14 ON OSA   7C14 SUBCHANNEL = 0002 | 
|  | 1803 | OSA  7C15 ON OSA   7C15 SUBCHANNEL = 0003 | 
|  | 1804 |  | 
| Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 1805 | If you have a guest with certain privileges you may be able to see devices | 
|  | 1806 | which don't belong to you. To avoid this, add the option V. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1807 | e.g. | 
|  | 1808 | Q V OSA | 
|  | 1809 |  | 
|  | 1810 | Now using the device numbers returned by this command we will | 
|  | 1811 | Trace the io starting up on the first device 7c08 & 7c09 | 
|  | 1812 | In our simplest case we can trace the | 
|  | 1813 | start subchannels | 
|  | 1814 | like TR SSCH 7C08-7C09 | 
|  | 1815 | or the halt subchannels | 
|  | 1816 | or TR HSCH 7C08-7C09 | 
|  | 1817 | MSCH's ,STSCH's I think you can guess the rest | 
|  | 1818 |  | 
|  | 1819 | Ingo's favourite trick is tracing all the IO's & CCWS & spooling them into the reader of another | 
|  | 1820 | VM guest so he can ftp the logfile back to his own machine.I'll do a small bit of this & give you | 
|  | 1821 | a look at the output. | 
|  | 1822 |  | 
|  | 1823 | 1) Spool stdout to VM reader | 
|  | 1824 | SP PRT TO (another vm guest ) or * for the local vm guest | 
|  | 1825 | 2) Fill the reader with the trace | 
|  | 1826 | TR IO 7c08-7c09 INST INT CCW PRT RUN | 
|  | 1827 | 3) Start up linux | 
|  | 1828 | i 00c | 
|  | 1829 | 4) Finish the trace | 
|  | 1830 | TR END | 
|  | 1831 | 5) close the reader | 
|  | 1832 | C PRT | 
|  | 1833 | 6) list reader contents | 
|  | 1834 | RDRLIST | 
|  | 1835 | 7) copy it to linux4's minidisk | 
|  | 1836 | RECEIVE / LOG TXT A1 ( replace | 
|  | 1837 | 8) | 
|  | 1838 | filel & press F11 to look at it | 
| Matt LaPlante | 53cb472 | 2006-10-03 22:55:17 +0200 | [diff] [blame] | 1839 | You should see something like: | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1840 |  | 
|  | 1841 | 00020942' SSCH  B2334000    0048813C    CC 0    SCH 0000    DEV 7C08 | 
|  | 1842 | CPA 000FFDF0   PARM 00E2C9C4    KEY 0  FPI C0  LPM 80 | 
|  | 1843 | CCW    000FFDF0  E4200100 00487FE8   0000  E4240100 ........ | 
|  | 1844 | IDAL                                      43D8AFE8 | 
|  | 1845 | IDAL                                      0FB76000 | 
|  | 1846 | 00020B0A'   I/O DEV 7C08 -> 000197BC'   SCH 0000   PARM 00E2C9C4 | 
|  | 1847 | 00021628' TSCH  B2354000 >> 00488164    CC 0    SCH 0000    DEV 7C08 | 
|  | 1848 | CCWA 000FFDF8   DEV STS 0C  SCH STS 00  CNT 00EC | 
|  | 1849 | KEY 0   FPI C0  CC 0   CTLS 4007 | 
|  | 1850 | 00022238' STSCH B2344000 >> 00488108    CC 0    SCH 0000    DEV 7C08 | 
|  | 1851 |  | 
|  | 1852 | If you don't like messing up your readed ( because you possibly booted from it ) | 
|  | 1853 | you can alternatively spool it to another readers guest. | 
|  | 1854 |  | 
|  | 1855 |  | 
|  | 1856 | Other common VM device related commands | 
|  | 1857 | --------------------------------------------- | 
|  | 1858 | These commands are listed only because they have | 
|  | 1859 | been of use to me in the past & may be of use to | 
|  | 1860 | you too. For more complete info on each of the commands | 
|  | 1861 | use type HELP <command> from CMS. | 
|  | 1862 | detaching devices | 
|  | 1863 | DET <devno range> | 
|  | 1864 | ATT <devno range> <guest> | 
|  | 1865 | attach a device to guest * for your own guest | 
|  | 1866 | READY <devno> cause VM to issue a fake interrupt. | 
|  | 1867 |  | 
|  | 1868 | The VARY command is normally only available to VM administrators. | 
|  | 1869 | VARY ON PATH <path> TO <devno range> | 
|  | 1870 | VARY OFF PATH <PATH> FROM <devno range> | 
|  | 1871 | This is used to switch on or off channel paths to devices. | 
|  | 1872 |  | 
|  | 1873 | Q CHPID <channel path ID> | 
|  | 1874 | This displays state of devices using this channel path | 
|  | 1875 | D SCHIB <subchannel> | 
|  | 1876 | This displays the subchannel information SCHIB block for the device. | 
|  | 1877 | this I believe is also only available to administrators. | 
|  | 1878 | DEFINE CTC <devno> | 
|  | 1879 | defines a virtual CTC channel to channel connection | 
|  | 1880 | 2 need to be defined on each guest for the CTC driver to use. | 
|  | 1881 | COUPLE  devno userid remote devno | 
|  | 1882 | Joins a local virtual device to a remote virtual device | 
|  | 1883 | ( commonly used for the CTC driver ). | 
|  | 1884 |  | 
|  | 1885 | Building a VM ramdisk under CMS which linux can use | 
|  | 1886 | def vfb-<blocksize> <subchannel> <number blocks> | 
|  | 1887 | blocksize is commonly 4096 for linux. | 
|  | 1888 | Formatting it | 
|  | 1889 | format <subchannel> <driver letter e.g. x> (blksize <blocksize> | 
|  | 1890 |  | 
|  | 1891 | Sharing a disk between multiple guests | 
|  | 1892 | LINK userid devno1 devno2 mode password | 
|  | 1893 |  | 
|  | 1894 |  | 
|  | 1895 |  | 
|  | 1896 | GDB on S390 | 
|  | 1897 | =========== | 
|  | 1898 | N.B. if compiling for debugging gdb works better without optimisation | 
|  | 1899 | ( see Compiling programs for debugging ) | 
|  | 1900 |  | 
|  | 1901 | invocation | 
|  | 1902 | ---------- | 
|  | 1903 | gdb <victim program> <optional corefile> | 
|  | 1904 |  | 
|  | 1905 | Online help | 
|  | 1906 | ----------- | 
|  | 1907 | help: gives help on commands | 
|  | 1908 | e.g. | 
|  | 1909 | help | 
|  | 1910 | help display | 
|  | 1911 | Note gdb's online help is very good use it. | 
|  | 1912 |  | 
|  | 1913 |  | 
|  | 1914 | Assembly | 
|  | 1915 | -------- | 
|  | 1916 | info registers: displays registers other than floating point. | 
|  | 1917 | info all-registers: displays floating points as well. | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 1918 | disassemble: disassembles | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1919 | e.g. | 
|  | 1920 | disassemble without parameters will disassemble the current function | 
|  | 1921 | disassemble $pc $pc+10 | 
|  | 1922 |  | 
|  | 1923 | Viewing & modifying variables | 
|  | 1924 | ----------------------------- | 
|  | 1925 | print or p: displays variable or register | 
|  | 1926 | e.g. p/x $sp will display the stack pointer | 
|  | 1927 |  | 
|  | 1928 | display: prints variable or register each time program stops | 
|  | 1929 | e.g. | 
|  | 1930 | display/x $pc will display the program counter | 
|  | 1931 | display argc | 
|  | 1932 |  | 
|  | 1933 | undisplay : undo's display's | 
|  | 1934 |  | 
|  | 1935 | info breakpoints: shows all current breakpoints | 
|  | 1936 |  | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 1937 | info stack: shows stack back trace ( if this doesn't work too well, I'll show you the | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1938 | stacktrace by hand below ). | 
|  | 1939 |  | 
|  | 1940 | info locals: displays local variables. | 
|  | 1941 |  | 
|  | 1942 | info args: display current procedure arguments. | 
|  | 1943 |  | 
|  | 1944 | set args: will set argc & argv each time the victim program is invoked. | 
|  | 1945 |  | 
|  | 1946 | set <variable>=value | 
|  | 1947 | set argc=100 | 
|  | 1948 | set $pc=0 | 
|  | 1949 |  | 
|  | 1950 |  | 
|  | 1951 |  | 
|  | 1952 | Modifying execution | 
|  | 1953 | ------------------- | 
|  | 1954 | step: steps n lines of sourcecode | 
|  | 1955 | step steps 1 line. | 
|  | 1956 | step 100 steps 100 lines of code. | 
|  | 1957 |  | 
|  | 1958 | next: like step except this will not step into subroutines | 
|  | 1959 |  | 
|  | 1960 | stepi: steps a single machine code instruction. | 
|  | 1961 | e.g. stepi 100 | 
|  | 1962 |  | 
|  | 1963 | nexti: steps a single machine code instruction but will not step into subroutines. | 
|  | 1964 |  | 
|  | 1965 | finish: will run until exit of the current routine | 
|  | 1966 |  | 
|  | 1967 | run: (re)starts a program | 
|  | 1968 |  | 
|  | 1969 | cont: continues a program | 
|  | 1970 |  | 
|  | 1971 | quit: exits gdb. | 
|  | 1972 |  | 
|  | 1973 |  | 
|  | 1974 | breakpoints | 
|  | 1975 | ------------ | 
|  | 1976 |  | 
|  | 1977 | break | 
|  | 1978 | sets a breakpoint | 
|  | 1979 | e.g. | 
|  | 1980 |  | 
|  | 1981 | break main | 
|  | 1982 |  | 
|  | 1983 | break *$pc | 
|  | 1984 |  | 
|  | 1985 | break *0x400618 | 
|  | 1986 |  | 
| Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 1987 | Here's a really useful one for large programs | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1988 | rbr | 
|  | 1989 | Set a breakpoint for all functions matching REGEXP | 
|  | 1990 | e.g. | 
|  | 1991 | rbr 390 | 
|  | 1992 | will set a breakpoint with all functions with 390 in their name. | 
|  | 1993 |  | 
|  | 1994 | info breakpoints | 
|  | 1995 | lists all breakpoints | 
|  | 1996 |  | 
|  | 1997 | delete: delete breakpoint by number or delete them all | 
|  | 1998 | e.g. | 
|  | 1999 | delete 1 will delete the first breakpoint | 
|  | 2000 | delete will delete them all | 
|  | 2001 |  | 
|  | 2002 | watch: This will set a watchpoint ( usually hardware assisted ), | 
|  | 2003 | This will watch a variable till it changes | 
|  | 2004 | e.g. | 
|  | 2005 | watch cnt, will watch the variable cnt till it changes. | 
|  | 2006 | As an aside unfortunately gdb's, architecture independent watchpoint code | 
|  | 2007 | is inconsistent & not very good, watchpoints usually work but not always. | 
|  | 2008 |  | 
|  | 2009 | info watchpoints: Display currently active watchpoints | 
|  | 2010 |  | 
|  | 2011 | condition: ( another useful one ) | 
|  | 2012 | Specify breakpoint number N to break only if COND is true. | 
|  | 2013 | Usage is `condition N COND', where N is an integer and COND is an | 
|  | 2014 | expression to be evaluated whenever breakpoint N is reached. | 
|  | 2015 |  | 
|  | 2016 |  | 
|  | 2017 |  | 
|  | 2018 | User defined functions/macros | 
|  | 2019 | ----------------------------- | 
|  | 2020 | define: ( Note this is very very useful,simple & powerful ) | 
|  | 2021 | usage define <name> <list of commands> end | 
|  | 2022 |  | 
|  | 2023 | examples which you should consider putting into .gdbinit in your home directory | 
|  | 2024 | define d | 
|  | 2025 | stepi | 
|  | 2026 | disassemble $pc $pc+10 | 
|  | 2027 | end | 
|  | 2028 |  | 
|  | 2029 | define e | 
|  | 2030 | nexti | 
|  | 2031 | disassemble $pc $pc+10 | 
|  | 2032 | end | 
|  | 2033 |  | 
|  | 2034 |  | 
|  | 2035 | Other hard to classify stuff | 
|  | 2036 | ---------------------------- | 
|  | 2037 | signal n: | 
|  | 2038 | sends the victim program a signal. | 
|  | 2039 | e.g. signal 3 will send a SIGQUIT. | 
|  | 2040 |  | 
|  | 2041 | info signals: | 
|  | 2042 | what gdb does when the victim receives certain signals. | 
|  | 2043 |  | 
|  | 2044 | list: | 
|  | 2045 | e.g. | 
|  | 2046 | list lists current function source | 
| Matt LaPlante | 6c28f2c | 2006-10-03 22:46:31 +0200 | [diff] [blame] | 2047 | list 1,10 list first 10 lines of current file. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2048 | list test.c:1,10 | 
|  | 2049 |  | 
|  | 2050 |  | 
|  | 2051 | directory: | 
|  | 2052 | Adds directories to be searched for source if gdb cannot find the source. | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 2053 | (note it is a bit sensitive about slashes) | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2054 | e.g. To add the root of the filesystem to the searchpath do | 
|  | 2055 | directory // | 
|  | 2056 |  | 
|  | 2057 |  | 
|  | 2058 | call <function> | 
|  | 2059 | This calls a function in the victim program, this is pretty powerful | 
|  | 2060 | e.g. | 
|  | 2061 | (gdb) call printf("hello world") | 
|  | 2062 | outputs: | 
|  | 2063 | $1 = 11 | 
|  | 2064 |  | 
|  | 2065 | You might now be thinking that the line above didn't work, something extra had to be done. | 
|  | 2066 | (gdb) call fflush(stdout) | 
|  | 2067 | hello world$2 = 0 | 
|  | 2068 | As an aside the debugger also calls malloc & free under the hood | 
|  | 2069 | to make space for the "hello world" string. | 
|  | 2070 |  | 
|  | 2071 |  | 
|  | 2072 |  | 
|  | 2073 | hints | 
|  | 2074 | ----- | 
|  | 2075 | 1) command completion works just like bash | 
|  | 2076 | ( if you are a bad typist like me this really helps ) | 
|  | 2077 | e.g. hit br <TAB> & cursor up & down :-). | 
|  | 2078 |  | 
|  | 2079 | 2) if you have a debugging problem that takes a few steps to recreate | 
|  | 2080 | put the steps into a file called .gdbinit in your current working directory | 
|  | 2081 | if you have defined a few extra useful user defined commands put these in | 
|  | 2082 | your home directory & they will be read each time gdb is launched. | 
|  | 2083 |  | 
|  | 2084 | A typical .gdbinit file might be. | 
|  | 2085 | break main | 
|  | 2086 | run | 
|  | 2087 | break runtime_exception | 
|  | 2088 | cont | 
|  | 2089 |  | 
|  | 2090 |  | 
|  | 2091 | stack chaining in gdb by hand | 
|  | 2092 | ----------------------------- | 
|  | 2093 | This is done using a the same trick described for VM | 
|  | 2094 | p/x (*($sp+56))&0x7fffffff get the first backchain. | 
|  | 2095 |  | 
|  | 2096 | For z/Architecture | 
|  | 2097 | Replace 56 with 112 & ignore the &0x7fffffff | 
|  | 2098 | in the macros below & do nasty casts to longs like the following | 
|  | 2099 | as gdb unfortunately deals with printed arguments as ints which | 
|  | 2100 | messes up everything. | 
|  | 2101 | i.e. here is a 3rd backchain dereference | 
|  | 2102 | p/x *(long *)(***(long ***)$sp+112) | 
|  | 2103 |  | 
|  | 2104 |  | 
|  | 2105 | this outputs | 
|  | 2106 | $5 = 0x528f18 | 
|  | 2107 | on my machine. | 
|  | 2108 | Now you can use | 
|  | 2109 | info symbol (*($sp+56))&0x7fffffff | 
|  | 2110 | you might see something like. | 
|  | 2111 | rl_getc + 36 in section .text  telling you what is located at address 0x528f18 | 
|  | 2112 | Now do. | 
|  | 2113 | p/x (*(*$sp+56))&0x7fffffff | 
|  | 2114 | This outputs | 
|  | 2115 | $6 = 0x528ed0 | 
|  | 2116 | Now do. | 
|  | 2117 | info symbol (*(*$sp+56))&0x7fffffff | 
|  | 2118 | rl_read_key + 180 in section .text | 
|  | 2119 | now do | 
|  | 2120 | p/x (*(**$sp+56))&0x7fffffff | 
|  | 2121 | & so on. | 
|  | 2122 |  | 
|  | 2123 | Disassembling instructions without debug info | 
|  | 2124 | --------------------------------------------- | 
| Matt LaPlante | 6c28f2c | 2006-10-03 22:46:31 +0200 | [diff] [blame] | 2125 | gdb typically complains if there is a lack of debugging | 
|  | 2126 | symbols in the disassemble command with | 
|  | 2127 | "No function contains specified address." To get around | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2128 | this do | 
|  | 2129 | x/<number lines to disassemble>xi <address> | 
|  | 2130 | e.g. | 
|  | 2131 | x/20xi 0x400730 | 
|  | 2132 |  | 
|  | 2133 |  | 
|  | 2134 |  | 
|  | 2135 | Note: Remember gdb has history just like bash you don't need to retype the | 
|  | 2136 | whole line just use the up & down arrows. | 
|  | 2137 |  | 
|  | 2138 |  | 
|  | 2139 |  | 
|  | 2140 | For more info | 
|  | 2141 | ------------- | 
|  | 2142 | From your linuxbox do | 
|  | 2143 | man gdb or info gdb. | 
|  | 2144 |  | 
|  | 2145 | core dumps | 
|  | 2146 | ---------- | 
|  | 2147 | What a core dump ?, | 
|  | 2148 | A core dump is a file generated by the kernel ( if allowed ) which contains the registers, | 
|  | 2149 | & all active pages of the program which has crashed. | 
|  | 2150 | From this file gdb will allow you to look at the registers & stack trace & memory of the | 
|  | 2151 | program as if it just crashed on your system, it is usually called core & created in the | 
|  | 2152 | current working directory. | 
|  | 2153 | This is very useful in that a customer can mail a core dump to a technical support department | 
|  | 2154 | & the technical support department can reconstruct what happened. | 
| Nicolas Kaiser | 2254f5a | 2006-12-04 15:40:23 +0100 | [diff] [blame] | 2155 | Provided they have an identical copy of this program with debugging symbols compiled in & | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2156 | the source base of this build is available. | 
|  | 2157 | In short it is far more useful than something like a crash log could ever hope to be. | 
|  | 2158 |  | 
|  | 2159 | In theory all that is missing to restart a core dumped program is a kernel patch which | 
|  | 2160 | will do the following. | 
|  | 2161 | 1) Make a new kernel task structure | 
|  | 2162 | 2) Reload all the dumped pages back into the kernel's memory management structures. | 
|  | 2163 | 3) Do the required clock fixups | 
|  | 2164 | 4) Get all files & network connections for the process back into an identical state ( really difficult ). | 
|  | 2165 | 5) A few more difficult things I haven't thought of. | 
|  | 2166 |  | 
|  | 2167 |  | 
|  | 2168 |  | 
|  | 2169 | Why have I never seen one ?. | 
|  | 2170 | Probably because you haven't used the command | 
|  | 2171 | ulimit -c unlimited in bash | 
|  | 2172 | to allow core dumps, now do | 
|  | 2173 | ulimit -a | 
|  | 2174 | to verify that the limit was accepted. | 
|  | 2175 |  | 
|  | 2176 | A sample core dump | 
|  | 2177 | To create this I'm going to do | 
|  | 2178 | ulimit -c unlimited | 
|  | 2179 | gdb | 
|  | 2180 | to launch gdb (my victim app. ) now be bad & do the following from another | 
|  | 2181 | telnet/xterm session to the same machine | 
|  | 2182 | ps -aux | grep gdb | 
|  | 2183 | kill -SIGSEGV <gdb's pid> | 
|  | 2184 | or alternatively use killall -SIGSEGV gdb if you have the killall command. | 
|  | 2185 | Now look at the core dump. | 
| Paolo Ornati | 670e9f3 | 2006-10-03 22:57:56 +0200 | [diff] [blame] | 2186 | ./gdb core | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2187 | Displays the following | 
|  | 2188 | GNU gdb 4.18 | 
|  | 2189 | Copyright 1998 Free Software Foundation, Inc. | 
|  | 2190 | GDB is free software, covered by the GNU General Public License, and you are | 
|  | 2191 | welcome to change it and/or distribute copies of it under certain conditions. | 
|  | 2192 | Type "show copying" to see the conditions. | 
|  | 2193 | There is absolutely no warranty for GDB.  Type "show warranty" for details. | 
|  | 2194 | This GDB was configured as "s390-ibm-linux"... | 
|  | 2195 | Core was generated by `./gdb'. | 
|  | 2196 | Program terminated with signal 11, Segmentation fault. | 
|  | 2197 | Reading symbols from /usr/lib/libncurses.so.4...done. | 
|  | 2198 | Reading symbols from /lib/libm.so.6...done. | 
|  | 2199 | Reading symbols from /lib/libc.so.6...done. | 
|  | 2200 | Reading symbols from /lib/ld-linux.so.2...done. | 
|  | 2201 | #0  0x40126d1a in read () from /lib/libc.so.6 | 
|  | 2202 | Setting up the environment for debugging gdb. | 
|  | 2203 | Breakpoint 1 at 0x4dc6f8: file utils.c, line 471. | 
|  | 2204 | Breakpoint 2 at 0x4d87a4: file top.c, line 2609. | 
|  | 2205 | (top-gdb) info stack | 
|  | 2206 | #0  0x40126d1a in read () from /lib/libc.so.6 | 
|  | 2207 | #1  0x528f26 in rl_getc (stream=0x7ffffde8) at input.c:402 | 
|  | 2208 | #2  0x528ed0 in rl_read_key () at input.c:381 | 
|  | 2209 | #3  0x5167e6 in readline_internal_char () at readline.c:454 | 
|  | 2210 | #4  0x5168ee in readline_internal_charloop () at readline.c:507 | 
|  | 2211 | #5  0x51692c in readline_internal () at readline.c:521 | 
| John Anthony Kazos Jr | be2a608 | 2007-05-09 08:50:42 +0200 | [diff] [blame] | 2212 | #6  0x5164fe in readline (prompt=0x7ffff810 "\177ÿøx\177ÿ÷Ø\177ÿøxÀ") | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2213 | at readline.c:349 | 
| Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 2214 | #7  0x4d7a8a in command_line_input (prompt=0x564420 "(gdb) ", repeat=1, | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2215 | annotation_suffix=0x4d6b44 "prompt") at top.c:2091 | 
|  | 2216 | #8  0x4d6cf0 in command_loop () at top.c:1345 | 
|  | 2217 | #9  0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635 | 
|  | 2218 |  | 
|  | 2219 |  | 
|  | 2220 | LDD | 
|  | 2221 | === | 
|  | 2222 | This is a program which lists the shared libraries which a library needs, | 
|  | 2223 | Note you also get the relocations of the shared library text segments which | 
|  | 2224 | help when using objdump --source. | 
|  | 2225 | e.g. | 
|  | 2226 | ldd ./gdb | 
|  | 2227 | outputs | 
|  | 2228 | libncurses.so.4 => /usr/lib/libncurses.so.4 (0x40018000) | 
|  | 2229 | libm.so.6 => /lib/libm.so.6 (0x4005e000) | 
|  | 2230 | libc.so.6 => /lib/libc.so.6 (0x40084000) | 
|  | 2231 | /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) | 
|  | 2232 |  | 
|  | 2233 |  | 
|  | 2234 | Debugging shared libraries | 
|  | 2235 | ========================== | 
|  | 2236 | Most programs use shared libraries, however it can be very painful | 
|  | 2237 | when you single step instruction into a function like printf for the | 
|  | 2238 | first time & you end up in functions like _dl_runtime_resolve this is | 
|  | 2239 | the ld.so doing lazy binding, lazy binding is a concept in ELF where | 
|  | 2240 | shared library functions are not loaded into memory unless they are | 
|  | 2241 | actually used, great for saving memory but a pain to debug. | 
|  | 2242 | To get around this either relink the program -static or exit gdb type | 
|  | 2243 | export LD_BIND_NOW=true this will stop lazy binding & restart the gdb'ing | 
|  | 2244 | the program in question. | 
|  | 2245 |  | 
|  | 2246 |  | 
|  | 2247 |  | 
|  | 2248 | Debugging modules | 
|  | 2249 | ================= | 
|  | 2250 | As modules are dynamically loaded into the kernel their address can be | 
|  | 2251 | anywhere to get around this use the -m option with insmod to emit a load | 
|  | 2252 | map which can be piped into a file if required. | 
|  | 2253 |  | 
|  | 2254 | The proc file system | 
|  | 2255 | ==================== | 
|  | 2256 | What is it ?. | 
|  | 2257 | It is a filesystem created by the kernel with files which are created on demand | 
|  | 2258 | by the kernel if read, or can be used to modify kernel parameters, | 
|  | 2259 | it is a powerful concept. | 
|  | 2260 |  | 
|  | 2261 | e.g. | 
|  | 2262 |  | 
|  | 2263 | cat /proc/sys/net/ipv4/ip_forward | 
|  | 2264 | On my machine outputs | 
|  | 2265 | 0 | 
|  | 2266 | telling me ip_forwarding is not on to switch it on I can do | 
|  | 2267 | echo 1 >  /proc/sys/net/ipv4/ip_forward | 
|  | 2268 | cat it again | 
|  | 2269 | cat /proc/sys/net/ipv4/ip_forward | 
|  | 2270 | On my machine now outputs | 
|  | 2271 | 1 | 
|  | 2272 | IP forwarding is on. | 
|  | 2273 | There is a lot of useful info in here best found by going in & having a look around, | 
|  | 2274 | so I'll take you through some entries I consider important. | 
|  | 2275 |  | 
| Sylvestre Ledru | f65e51d | 2011-04-04 15:04:46 -0700 | [diff] [blame] | 2276 | All the processes running on the machine have their own entry defined by | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2277 | /proc/<pid> | 
|  | 2278 | So lets have a look at the init process | 
|  | 2279 | cd /proc/1 | 
|  | 2280 |  | 
|  | 2281 | cat cmdline | 
|  | 2282 | emits | 
|  | 2283 | init [2] | 
|  | 2284 |  | 
|  | 2285 | cd /proc/1/fd | 
|  | 2286 | This contains numerical entries of all the open files, | 
|  | 2287 | some of these you can cat e.g. stdout (2) | 
|  | 2288 |  | 
|  | 2289 | cat /proc/29/maps | 
|  | 2290 | on my machine emits | 
|  | 2291 |  | 
|  | 2292 | 00400000-00478000 r-xp 00000000 5f:00 4103       /bin/bash | 
|  | 2293 | 00478000-0047e000 rw-p 00077000 5f:00 4103       /bin/bash | 
|  | 2294 | 0047e000-00492000 rwxp 00000000 00:00 0 | 
|  | 2295 | 40000000-40015000 r-xp 00000000 5f:00 14382      /lib/ld-2.1.2.so | 
|  | 2296 | 40015000-40016000 rw-p 00014000 5f:00 14382      /lib/ld-2.1.2.so | 
|  | 2297 | 40016000-40017000 rwxp 00000000 00:00 0 | 
|  | 2298 | 40017000-40018000 rw-p 00000000 00:00 0 | 
|  | 2299 | 40018000-4001b000 r-xp 00000000 5f:00 14435      /lib/libtermcap.so.2.0.8 | 
|  | 2300 | 4001b000-4001c000 rw-p 00002000 5f:00 14435      /lib/libtermcap.so.2.0.8 | 
|  | 2301 | 4001c000-4010d000 r-xp 00000000 5f:00 14387      /lib/libc-2.1.2.so | 
|  | 2302 | 4010d000-40111000 rw-p 000f0000 5f:00 14387      /lib/libc-2.1.2.so | 
|  | 2303 | 40111000-40114000 rw-p 00000000 00:00 0 | 
|  | 2304 | 40114000-4011e000 r-xp 00000000 5f:00 14408      /lib/libnss_files-2.1.2.so | 
|  | 2305 | 4011e000-4011f000 rw-p 00009000 5f:00 14408      /lib/libnss_files-2.1.2.so | 
|  | 2306 | 7fffd000-80000000 rwxp ffffe000 00:00 0 | 
|  | 2307 |  | 
|  | 2308 |  | 
|  | 2309 | Showing us the shared libraries init uses where they are in memory | 
|  | 2310 | & memory access permissions for each virtual memory area. | 
|  | 2311 |  | 
|  | 2312 | /proc/1/cwd is a softlink to the current working directory. | 
|  | 2313 | /proc/1/root is the root of the filesystem for this process. | 
|  | 2314 |  | 
|  | 2315 | /proc/1/mem is the current running processes memory which you | 
|  | 2316 | can read & write to like a file. | 
|  | 2317 | strace uses this sometimes as it is a bit faster than the | 
| Matt LaPlante | 2fe0ae7 | 2006-10-03 22:50:39 +0200 | [diff] [blame] | 2318 | rather inefficient ptrace interface for peeking at DATA. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2319 |  | 
|  | 2320 |  | 
|  | 2321 | cat status | 
|  | 2322 |  | 
|  | 2323 | Name:   init | 
|  | 2324 | State:  S (sleeping) | 
|  | 2325 | Pid:    1 | 
|  | 2326 | PPid:   0 | 
|  | 2327 | Uid:    0       0       0       0 | 
|  | 2328 | Gid:    0       0       0       0 | 
|  | 2329 | Groups: | 
|  | 2330 | VmSize:      408 kB | 
|  | 2331 | VmLck:         0 kB | 
|  | 2332 | VmRSS:       208 kB | 
|  | 2333 | VmData:       24 kB | 
|  | 2334 | VmStk:         8 kB | 
|  | 2335 | VmExe:       368 kB | 
|  | 2336 | VmLib:         0 kB | 
|  | 2337 | SigPnd: 0000000000000000 | 
|  | 2338 | SigBlk: 0000000000000000 | 
|  | 2339 | SigIgn: 7fffffffd7f0d8fc | 
|  | 2340 | SigCgt: 00000000280b2603 | 
|  | 2341 | CapInh: 00000000fffffeff | 
|  | 2342 | CapPrm: 00000000ffffffff | 
|  | 2343 | CapEff: 00000000fffffeff | 
|  | 2344 |  | 
|  | 2345 | User PSW:    070de000 80414146 | 
|  | 2346 | task: 004b6000 tss: 004b62d8 ksp: 004b7ca8 pt_regs: 004b7f68 | 
|  | 2347 | User GPRS: | 
|  | 2348 | 00000400  00000000  0000000b  7ffffa90 | 
|  | 2349 | 00000000  00000000  00000000  0045d9f4 | 
|  | 2350 | 0045cafc  7ffffa90  7fffff18  0045cb08 | 
|  | 2351 | 00010400  804039e8  80403af8  7ffff8b0 | 
|  | 2352 | User ACRS: | 
|  | 2353 | 00000000  00000000  00000000  00000000 | 
|  | 2354 | 00000001  00000000  00000000  00000000 | 
|  | 2355 | 00000000  00000000  00000000  00000000 | 
|  | 2356 | 00000000  00000000  00000000  00000000 | 
|  | 2357 | Kernel BackChain  CallChain    BackChain  CallChain | 
|  | 2358 | 004b7ca8   8002bd0c     004b7d18   8002b92c | 
|  | 2359 | 004b7db8   8005cd50     004b7e38   8005d12a | 
|  | 2360 | 004b7f08   80019114 | 
|  | 2361 | Showing among other things memory usage & status of some signals & | 
|  | 2362 | the processes'es registers from the kernel task_structure | 
|  | 2363 | as well as a backchain which may be useful if a process crashes | 
|  | 2364 | in the kernel for some unknown reason. | 
|  | 2365 |  | 
|  | 2366 | Some driver debugging techniques | 
|  | 2367 | ================================ | 
|  | 2368 | debug feature | 
|  | 2369 | ------------- | 
|  | 2370 | Some of our drivers now support a "debug feature" in | 
|  | 2371 | /proc/s390dbf see s390dbf.txt in the linux/Documentation directory | 
|  | 2372 | for more info. | 
|  | 2373 | e.g. | 
|  | 2374 | to switch on the lcs "debug feature" | 
|  | 2375 | echo 5 > /proc/s390dbf/lcs/level | 
|  | 2376 | & then after the error occurred. | 
|  | 2377 | cat /proc/s390dbf/lcs/sprintf >/logfile | 
|  | 2378 | the logfile now contains some information which may help | 
|  | 2379 | tech support resolve a problem in the field. | 
|  | 2380 |  | 
|  | 2381 |  | 
|  | 2382 |  | 
|  | 2383 | high level debugging network drivers | 
|  | 2384 | ------------------------------------ | 
|  | 2385 | ifconfig is a quite useful command | 
|  | 2386 | it gives the current state of network drivers. | 
|  | 2387 |  | 
|  | 2388 | If you suspect your network device driver is dead | 
|  | 2389 | one way to check is type | 
|  | 2390 | ifconfig <network device> | 
|  | 2391 | e.g. tr0 | 
|  | 2392 | You should see something like | 
|  | 2393 | tr0       Link encap:16/4 Mbps Token Ring (New)  HWaddr 00:04:AC:20:8E:48 | 
|  | 2394 | inet addr:9.164.185.132  Bcast:9.164.191.255  Mask:255.255.224.0 | 
|  | 2395 | UP BROADCAST RUNNING MULTICAST  MTU:2000  Metric:1 | 
|  | 2396 | RX packets:246134 errors:0 dropped:0 overruns:0 frame:0 | 
|  | 2397 | TX packets:5 errors:0 dropped:0 overruns:0 carrier:0 | 
|  | 2398 | collisions:0 txqueuelen:100 | 
|  | 2399 |  | 
|  | 2400 | if the device doesn't say up | 
|  | 2401 | try | 
|  | 2402 | /etc/rc.d/init.d/network start | 
|  | 2403 | ( this starts the network stack & hopefully calls ifconfig tr0 up ). | 
|  | 2404 | ifconfig looks at the output of /proc/net/dev & presents it in a more presentable form | 
|  | 2405 | Now ping the device from a machine in the same subnet. | 
|  | 2406 | if the RX packets count & TX packets counts don't increment you probably | 
|  | 2407 | have problems. | 
|  | 2408 | next | 
|  | 2409 | cat /proc/net/arp | 
|  | 2410 | Do you see any hardware addresses in the cache if not you may have problems. | 
|  | 2411 | Next try | 
|  | 2412 | ping -c 5 <broadcast_addr> i.e. the Bcast field above in the output of | 
|  | 2413 | ifconfig. Do you see any replies from machines other than the local machine | 
|  | 2414 | if not you may have problems. also if the TX packets count in ifconfig | 
|  | 2415 | hasn't incremented either you have serious problems in your driver | 
|  | 2416 | (e.g. the txbusy field of the network device being stuck on ) | 
|  | 2417 | or you may have multiple network devices connected. | 
|  | 2418 |  | 
|  | 2419 |  | 
|  | 2420 | chandev | 
|  | 2421 | ------- | 
|  | 2422 | There is a new device layer for channel devices, some | 
|  | 2423 | drivers e.g. lcs are registered with this layer. | 
|  | 2424 | If the device uses the channel device layer you'll be | 
|  | 2425 | able to find what interrupts it uses & the current state | 
|  | 2426 | of the device. | 
|  | 2427 | See the manpage chandev.8 &type cat /proc/chandev for more info. | 
|  | 2428 |  | 
|  | 2429 |  | 
|  | 2430 |  | 
|  | 2431 | Starting points for debugging scripting languages etc. | 
|  | 2432 | ====================================================== | 
|  | 2433 |  | 
|  | 2434 | bash/sh | 
|  | 2435 |  | 
|  | 2436 | bash -x <scriptname> | 
|  | 2437 | e.g. bash -x /usr/bin/bashbug | 
|  | 2438 | displays the following lines as it executes them. | 
|  | 2439 | + MACHINE=i586 | 
|  | 2440 | + OS=linux-gnu | 
|  | 2441 | + CC=gcc | 
|  | 2442 | + CFLAGS= -DPROGRAM='bash' -DHOSTTYPE='i586' -DOSTYPE='linux-gnu' -DMACHTYPE='i586-pc-linux-gnu' -DSHELL -DHAVE_CONFIG_H   -I. -I. -I./lib -O2 -pipe | 
|  | 2443 | + RELEASE=2.01 | 
|  | 2444 | + PATCHLEVEL=1 | 
|  | 2445 | + RELSTATUS=release | 
|  | 2446 | + MACHTYPE=i586-pc-linux-gnu | 
|  | 2447 |  | 
| Matt LaPlante | 2fe0ae7 | 2006-10-03 22:50:39 +0200 | [diff] [blame] | 2448 | perl -d <scriptname> runs the perlscript in a fully interactive debugger | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2449 | <like gdb>. | 
|  | 2450 | Type 'h' in the debugger for help. | 
|  | 2451 |  | 
|  | 2452 | for debugging java type | 
|  | 2453 | jdb <filename> another fully interactive gdb style debugger. | 
|  | 2454 | & type ? in the debugger for help. | 
|  | 2455 |  | 
|  | 2456 |  | 
|  | 2457 |  | 
|  | 2458 | Dumptool & Lcrash ( lkcd ) | 
|  | 2459 | ========================== | 
|  | 2460 | Michael Holzheu & others here at IBM have a fairly mature port of | 
|  | 2461 | SGI's lcrash tool which allows one to look at kernel structures in a | 
|  | 2462 | running kernel. | 
|  | 2463 |  | 
|  | 2464 | It also complements a tool called dumptool which dumps all the kernel's | 
|  | 2465 | memory pages & registers to either a tape or a disk. | 
|  | 2466 | This can be used by tech support or an ambitious end user do | 
|  | 2467 | post mortem debugging of a machine like gdb core dumps. | 
|  | 2468 |  | 
|  | 2469 | Going into how to use this tool in detail will be explained | 
|  | 2470 | in other documentation supplied by IBM with the patches & the | 
|  | 2471 | lcrash homepage http://oss.sgi.com/projects/lkcd/ & the lcrash manpage. | 
|  | 2472 |  | 
|  | 2473 | How they work | 
|  | 2474 | ------------- | 
|  | 2475 | Lcrash is a perfectly normal program,however, it requires 2 | 
|  | 2476 | additional files, Kerntypes which is built using a patch to the | 
|  | 2477 | linux kernel sources in the linux root directory & the System.map. | 
|  | 2478 |  | 
| Paolo Ornati | 670e9f3 | 2006-10-03 22:57:56 +0200 | [diff] [blame] | 2479 | Kerntypes is an objectfile whose sole purpose in life | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2480 | is to provide stabs debug info to lcrash, to do this | 
|  | 2481 | Kerntypes is built from kerntypes.c which just includes the most commonly | 
|  | 2482 | referenced header files used when debugging, lcrash can then read the | 
|  | 2483 | .stabs section of this file. | 
|  | 2484 |  | 
|  | 2485 | Debugging a live system it uses /dev/mem | 
|  | 2486 | alternatively for post mortem debugging it uses the data | 
|  | 2487 | collected by dumptool. | 
|  | 2488 |  | 
|  | 2489 |  | 
|  | 2490 |  | 
|  | 2491 | SysRq | 
|  | 2492 | ===== | 
|  | 2493 | This is now supported by linux for s/390 & z/Architecture. | 
|  | 2494 | To enable it do compile the kernel with | 
|  | 2495 | Kernel Hacking -> Magic SysRq Key Enabled | 
|  | 2496 | echo "1" > /proc/sys/kernel/sysrq | 
|  | 2497 | also type | 
|  | 2498 | echo "8" >/proc/sys/kernel/printk | 
|  | 2499 | To make printk output go to console. | 
|  | 2500 | On 390 all commands are prefixed with | 
|  | 2501 | ^- | 
|  | 2502 | e.g. | 
|  | 2503 | ^-t will show tasks. | 
|  | 2504 | ^-? or some unknown command will display help. | 
|  | 2505 | The sysrq key reading is very picky ( I have to type the keys in an | 
|  | 2506 | xterm session & paste them  into the x3270 console ) | 
|  | 2507 | & it may be wise to predefine the keys as described in the VM hints above | 
|  | 2508 |  | 
|  | 2509 | This is particularly useful for syncing disks unmounting & rebooting | 
|  | 2510 | if the machine gets partially hung. | 
|  | 2511 |  | 
|  | 2512 | Read Documentation/sysrq.txt for more info | 
|  | 2513 |  | 
|  | 2514 | References: | 
|  | 2515 | =========== | 
|  | 2516 | Enterprise Systems Architecture Reference Summary | 
|  | 2517 | Enterprise Systems Architecture Principles of Operation | 
|  | 2518 | Hartmut Penners s390 stack frame sheet. | 
|  | 2519 | IBM Mainframe Channel Attachment a technology brief from a CISCO webpage | 
|  | 2520 | Various bits of man & info pages of Linux. | 
|  | 2521 | Linux & GDB source. | 
|  | 2522 | Various info & man pages. | 
|  | 2523 | CMS Help on tracing commands. | 
|  | 2524 | Linux for s/390 Elf Application Binary Interface | 
|  | 2525 | Linux for z/Series Elf Application Binary Interface ( Both Highly Recommended ) | 
|  | 2526 | z/Architecture Principles of Operation SA22-7832-00 | 
|  | 2527 | Enterprise Systems Architecture/390 Reference Summary SA22-7209-01 & the | 
|  | 2528 | Enterprise Systems Architecture/390 Principles of Operation SA22-7201-05 | 
|  | 2529 |  | 
|  | 2530 | Special Thanks | 
|  | 2531 | ============== | 
|  | 2532 | Special thanks to Neale Ferguson who maintains a much | 
|  | 2533 | prettier HTML version of this page at | 
| Justin P. Mattock | 0ea6e61 | 2010-07-23 20:51:24 -0700 | [diff] [blame] | 2534 | http://linuxvm.org/penguinvm/ | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2535 | Bob Grainger Stefan Bader & others for reporting bugs |