x86: fold pda into percpu area on SMP
[ Based on original patch from Christoph Lameter and Mike Travis. ]
Currently pdas and percpu areas are allocated separately. %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.
Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40. To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.
After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda. This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().
This patch also removes now unnecessary get_local_pda() and its call
sites.
A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index f50280d..962f21f 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -5,6 +5,7 @@
#define LOAD_OFFSET __START_KERNEL_map
#include <asm-generic/vmlinux.lds.h>
+#include <asm/asm-offsets.h>
#include <asm/page.h>
#undef i386 /* in case the preprocessor is a 32bit one */
@@ -215,10 +216,11 @@
/*
* percpu offsets are zero-based on SMP. PERCPU_VADDR() changes the
* output PHDR, so the next output section - __data_nosave - should
- * switch it back to data.init.
+ * switch it back to data.init. Also, pda should be at the head of
+ * percpu area. Preallocate it.
*/
. = ALIGN(PAGE_SIZE);
- PERCPU_VADDR(0, :percpu)
+ PERCPU_VADDR_PREALLOC(0, :percpu, pda_size)
#else
PERCPU(PAGE_SIZE)
#endif