|  | Paravirt_ops on IA64 | 
|  | ==================== | 
|  | 21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp> | 
|  |  | 
|  |  | 
|  | Introduction | 
|  | ------------ | 
|  | The aim of this documentation is to help with maintainability and/or to | 
|  | encourage people to use paravirt_ops/IA64. | 
|  |  | 
|  | paravirt_ops (pv_ops in short) is a way for virtualization support of | 
|  | Linux kernel on x86. Several ways for virtualization support were | 
|  | proposed, paravirt_ops is the winner. | 
|  | On the other hand, now there are also several IA64 virtualization | 
|  | technologies like kvm/IA64, xen/IA64 and many other academic IA64 | 
|  | hypervisors so that it is good to add generic virtualization | 
|  | infrastructure on Linux/IA64. | 
|  |  | 
|  |  | 
|  | What is paravirt_ops? | 
|  | --------------------- | 
|  | It has been developed on x86 as virtualization support via API, not ABI. | 
|  | It allows each hypervisor to override operations which are important for | 
|  | hypervisors at API level. And it allows a single kernel binary to run on | 
|  | all supported execution environments including native machine. | 
|  | Essentially paravirt_ops is a set of function pointers which represent | 
|  | operations corresponding to low level sensitive instructions and high | 
|  | level functionalities in various area. But one significant difference | 
|  | from usual function pointer table is that it allows optimization with | 
|  | binary patch. It is because some of these operations are very | 
|  | performance sensitive and indirect call overhead is not negligible. | 
|  | With binary patch, indirect C function call can be transformed into | 
|  | direct C function call or in-place execution to eliminate the overhead. | 
|  |  | 
|  | Thus, operations of paravirt_ops are classified into three categories. | 
|  | - simple indirect call | 
|  | These operations correspond to high level functionality so that the | 
|  | overhead of indirect call isn't very important. | 
|  |  | 
|  | - indirect call which allows optimization with binary patch | 
|  | Usually these operations correspond to low level instructions. They | 
|  | are called frequently and performance critical. So the overhead is | 
|  | very important. | 
|  |  | 
|  | - a set of macros for hand written assembly code | 
|  | Hand written assembly codes (.S files) also need paravirtualization | 
|  | because they include sensitive instructions or some of code paths in | 
|  | them are very performance critical. | 
|  |  | 
|  |  | 
|  | The relation to the IA64 machine vector | 
|  | --------------------------------------- | 
|  | Linux/IA64 has the IA64 machine vector functionality which allows the | 
|  | kernel to switch implementations (e.g. initialization, ipi, dma api...) | 
|  | depending on executing platform. | 
|  | We can replace some implementations very easily defining a new machine | 
|  | vector. Thus another approach for virtualization support would be | 
|  | enhancing the machine vector functionality. | 
|  | But paravirt_ops approach was taken because | 
|  | - virtualization support needs wider support than machine vector does. | 
|  | e.g. low level instruction paravirtualization. It must be | 
|  | initialized very early before platform detection. | 
|  |  | 
|  | - virtualization support needs more functionality like binary patch. | 
|  | Probably the calling overhead might not be very large compared to the | 
|  | emulation overhead of virtualization. However in the native case, the | 
|  | overhead should be eliminated completely. | 
|  | A single kernel binary should run on each environment including native, | 
|  | and the overhead of paravirt_ops on native environment should be as | 
|  | small as possible. | 
|  |  | 
|  | - for full virtualization technology, e.g. KVM/IA64 or | 
|  | Xen/IA64 HVM domain, the result would be | 
|  | (the emulated platform machine vector. probably dig) + (pv_ops). | 
|  | This means that the virtualization support layer should be under | 
|  | the machine vector layer. | 
|  |  | 
|  | Possibly it might be better to move some function pointers from | 
|  | paravirt_ops to machine vector. In fact, Xen domU case utilizes both | 
|  | pv_ops and machine vector. | 
|  |  | 
|  |  | 
|  | IA64 paravirt_ops | 
|  | ----------------- | 
|  | In this section, the concrete paravirt_ops will be discussed. | 
|  | Because of the architecture difference between ia64 and x86, the | 
|  | resulting set of functions is very different from x86 pv_ops. | 
|  |  | 
|  | - C function pointer tables | 
|  | They are not very performance critical so that simple C indirect | 
|  | function call is acceptable. The following structures are defined at | 
|  | this moment. For details see linux/include/asm-ia64/paravirt.h | 
|  | - struct pv_info | 
|  | This structure describes the execution environment. | 
|  | - struct pv_init_ops | 
|  | This structure describes the various initialization hooks. | 
|  | - struct pv_iosapic_ops | 
|  | This structure describes hooks to iosapic operations. | 
|  | - struct pv_irq_ops | 
|  | This structure describes hooks to irq related operations | 
|  | - struct pv_time_op | 
|  | This structure describes hooks to steal time accounting. | 
|  |  | 
|  | - a set of indirect calls which need optimization | 
|  | Currently this class of functions correspond to a subset of IA64 | 
|  | intrinsics. At this moment the optimization with binary patch isn't | 
|  | implemented yet. | 
|  | struct pv_cpu_op is defined. For details see | 
|  | linux/include/asm-ia64/paravirt_privop.h | 
|  | Mostly they correspond to ia64 intrinsics 1-to-1. | 
|  | Caveat: Now they are defined as C indirect function pointers, but in | 
|  | order to support binary patch optimization, they will be changed | 
|  | using GCC extended inline assembly code. | 
|  |  | 
|  | - a set of macros for hand written assembly code (.S files) | 
|  | For maintenance purpose, the taken approach for .S files is single | 
|  | source code and compile multiple times with different macros definitions. | 
|  | Each pv_ops instance must define those macros to compile. | 
|  | The important thing here is that sensitive, but non-privileged | 
|  | instructions must be paravirtualized and that some privileged | 
|  | instructions also need paravirtualization for reasonable performance. | 
|  | Developers who modify .S files must be aware of that. At this moment | 
|  | an easy checker is implemented to detect paravirtualization breakage. | 
|  | But it doesn't cover all the cases. | 
|  |  | 
|  | Sometimes this set of macros is called pv_cpu_asm_op. But there is no | 
|  | corresponding structure in the source code. | 
|  | Those macros mostly 1:1 correspond to a subset of privileged | 
|  | instructions. See linux/include/asm-ia64/native/inst.h. | 
|  | And some functions written in assembly also need to be overrided so | 
|  | that each pv_ops instance have to define some macros. Again see | 
|  | linux/include/asm-ia64/native/inst.h. | 
|  |  | 
|  |  | 
|  | Those structures must be initialized very early before start_kernel. | 
|  | Probably initialized in head.S using multi entry point or some other trick. | 
|  | For native case implementation see linux/arch/ia64/kernel/paravirt.c. |