| Isaku Yamahata | 8a2f2cc | 2008-05-27 15:16:47 -0700 | [diff] [blame] | 1 | Paravirt_ops on IA64 | 
 | 2 | ==================== | 
 | 3 |                           21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp> | 
 | 4 |  | 
 | 5 |  | 
 | 6 | Introduction | 
 | 7 | ------------ | 
 | 8 | The aim of this documentation is to help with maintainability and/or to | 
 | 9 | encourage people to use paravirt_ops/IA64. | 
 | 10 |  | 
 | 11 | paravirt_ops (pv_ops in short) is a way for virtualization support of | 
 | 12 | Linux kernel on x86. Several ways for virtualization support were | 
 | 13 | proposed, paravirt_ops is the winner. | 
 | 14 | On the other hand, now there are also several IA64 virtualization | 
 | 15 | technologies like kvm/IA64, xen/IA64 and many other academic IA64 | 
 | 16 | hypervisors so that it is good to add generic virtualization | 
 | 17 | infrastructure on Linux/IA64. | 
 | 18 |  | 
 | 19 |  | 
 | 20 | What is paravirt_ops? | 
 | 21 | --------------------- | 
 | 22 | It has been developed on x86 as virtualization support via API, not ABI. | 
 | 23 | It allows each hypervisor to override operations which are important for | 
 | 24 | hypervisors at API level. And it allows a single kernel binary to run on | 
 | 25 | all supported execution environments including native machine. | 
 | 26 | Essentially paravirt_ops is a set of function pointers which represent | 
 | 27 | operations corresponding to low level sensitive instructions and high | 
 | 28 | level functionalities in various area. But one significant difference | 
 | 29 | from usual function pointer table is that it allows optimization with | 
 | 30 | binary patch. It is because some of these operations are very | 
 | 31 | performance sensitive and indirect call overhead is not negligible. | 
 | 32 | With binary patch, indirect C function call can be transformed into | 
 | 33 | direct C function call or in-place execution to eliminate the overhead. | 
 | 34 |  | 
 | 35 | Thus, operations of paravirt_ops are classified into three categories. | 
 | 36 | - simple indirect call | 
 | 37 |   These operations correspond to high level functionality so that the | 
 | 38 |   overhead of indirect call isn't very important. | 
 | 39 |  | 
 | 40 | - indirect call which allows optimization with binary patch | 
 | 41 |   Usually these operations correspond to low level instructions. They | 
 | 42 |   are called frequently and performance critical. So the overhead is | 
 | 43 |   very important. | 
 | 44 |  | 
 | 45 | - a set of macros for hand written assembly code | 
 | 46 |   Hand written assembly codes (.S files) also need paravirtualization | 
 | 47 |   because they include sensitive instructions or some of code paths in | 
 | 48 |   them are very performance critical. | 
 | 49 |  | 
 | 50 |  | 
 | 51 | The relation to the IA64 machine vector | 
 | 52 | --------------------------------------- | 
 | 53 | Linux/IA64 has the IA64 machine vector functionality which allows the | 
 | 54 | kernel to switch implementations (e.g. initialization, ipi, dma api...) | 
 | 55 | depending on executing platform. | 
 | 56 | We can replace some implementations very easily defining a new machine | 
 | 57 | vector. Thus another approach for virtualization support would be | 
 | 58 | enhancing the machine vector functionality. | 
 | 59 | But paravirt_ops approach was taken because | 
 | 60 | - virtualization support needs wider support than machine vector does. | 
 | 61 |   e.g. low level instruction paravirtualization. It must be | 
 | 62 |        initialized very early before platform detection. | 
 | 63 |  | 
 | 64 | - virtualization support needs more functionality like binary patch. | 
 | 65 |   Probably the calling overhead might not be very large compared to the | 
 | 66 |   emulation overhead of virtualization. However in the native case, the | 
 | 67 |   overhead should be eliminated completely. | 
 | 68 |   A single kernel binary should run on each environment including native, | 
 | 69 |   and the overhead of paravirt_ops on native environment should be as | 
 | 70 |   small as possible. | 
 | 71 |  | 
 | 72 | - for full virtualization technology, e.g. KVM/IA64 or | 
 | 73 |   Xen/IA64 HVM domain, the result would be | 
 | 74 |   (the emulated platform machine vector. probably dig) + (pv_ops). | 
 | 75 |   This means that the virtualization support layer should be under | 
 | 76 |   the machine vector layer. | 
 | 77 |  | 
 | 78 | Possibly it might be better to move some function pointers from | 
 | 79 | paravirt_ops to machine vector. In fact, Xen domU case utilizes both | 
 | 80 | pv_ops and machine vector. | 
 | 81 |  | 
 | 82 |  | 
 | 83 | IA64 paravirt_ops | 
 | 84 | ----------------- | 
 | 85 | In this section, the concrete paravirt_ops will be discussed. | 
 | 86 | Because of the architecture difference between ia64 and x86, the | 
 | 87 | resulting set of functions is very different from x86 pv_ops. | 
 | 88 |  | 
 | 89 | - C function pointer tables | 
 | 90 | They are not very performance critical so that simple C indirect | 
 | 91 | function call is acceptable. The following structures are defined at | 
 | 92 | this moment. For details see linux/include/asm-ia64/paravirt.h | 
 | 93 |   - struct pv_info | 
 | 94 |     This structure describes the execution environment. | 
 | 95 |   - struct pv_init_ops | 
 | 96 |     This structure describes the various initialization hooks. | 
 | 97 |   - struct pv_iosapic_ops | 
 | 98 |     This structure describes hooks to iosapic operations. | 
 | 99 |   - struct pv_irq_ops | 
 | 100 |     This structure describes hooks to irq related operations | 
 | 101 |   - struct pv_time_op | 
 | 102 |     This structure describes hooks to steal time accounting. | 
 | 103 |  | 
 | 104 | - a set of indirect calls which need optimization | 
 | 105 | Currently this class of functions correspond to a subset of IA64 | 
 | 106 | intrinsics. At this moment the optimization with binary patch isn't | 
 | 107 | implemented yet. | 
 | 108 | struct pv_cpu_op is defined. For details see | 
 | 109 | linux/include/asm-ia64/paravirt_privop.h | 
 | 110 | Mostly they correspond to ia64 intrinsics 1-to-1. | 
 | 111 | Caveat: Now they are defined as C indirect function pointers, but in | 
 | 112 | order to support binary patch optimization, they will be changed | 
 | 113 | using GCC extended inline assembly code. | 
 | 114 |  | 
 | 115 | - a set of macros for hand written assembly code (.S files) | 
 | 116 | For maintenance purpose, the taken approach for .S files is single | 
 | 117 | source code and compile multiple times with different macros definitions. | 
 | 118 | Each pv_ops instance must define those macros to compile. | 
 | 119 | The important thing here is that sensitive, but non-privileged | 
 | 120 | instructions must be paravirtualized and that some privileged | 
 | 121 | instructions also need paravirtualization for reasonable performance. | 
 | 122 | Developers who modify .S files must be aware of that. At this moment | 
 | 123 | an easy checker is implemented to detect paravirtualization breakage. | 
 | 124 | But it doesn't cover all the cases. | 
 | 125 |  | 
 | 126 | Sometimes this set of macros is called pv_cpu_asm_op. But there is no | 
 | 127 | corresponding structure in the source code. | 
 | 128 | Those macros mostly 1:1 correspond to a subset of privileged | 
 | 129 | instructions. See linux/include/asm-ia64/native/inst.h. | 
 | 130 | And some functions written in assembly also need to be overrided so | 
 | 131 | that each pv_ops instance have to define some macros. Again see | 
 | 132 | linux/include/asm-ia64/native/inst.h. | 
 | 133 |  | 
 | 134 |  | 
 | 135 | Those structures must be initialized very early before start_kernel. | 
 | 136 | Probably initialized in head.S using multi entry point or some other trick. | 
 | 137 | For native case implementation see linux/arch/ia64/kernel/paravirt.c. |