sched: improve sched_clock() performance in scheduler-intense workloads native_read_tsc() overhead accounts for 20% of the system overhead: 659567 system_call 41222.9375 686796 schedule 435.7843 718382 __switch_to 665.1685 823875 switch_mm 4526.7857 1883122 native_read_tsc 55385.9412 9761990 total 2.8468 this is large part due to the rdtsc_barrier() that is done before and after reading the TSC. But sched_clock() is not a precise clock in the GTOD sense, using such barriers is completely pointless. So remove the barriers and only use them in vget_cycles(). This improves lat_ctx performance by about 5%. Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit: 0d12cdd5f883f508d33b85c1bae98fa28987c8c7 [log] [tgz]
author: Ingo Molnar <mingo@elte.hu> Sat Nov 08 16:19:55 2008 +0100
committer: Ingo Molnar <mingo@elte.hu> Sat Nov 08 16:48:19 2008 +0100
tree: e07bb1f9ef49062fbd9817fe41cab66964bedf03
parent: 52c642f33b14bfa1b00ef2b68296effb34a573f3 [diff] [blame]
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 38ae163..9cd83a8 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h

@@ -34,6 +34,8 @@
 
 static __always_inline cycles_t vget_cycles(void)
 {
+	cycles_t cycles;
+
 	/*
 	 * We only do VDSOs on TSC capable CPUs, so this shouldnt
 	 * access boot_cpu_data (which is not VDSO-safe):
@@ -42,7 +44,11 @@
 	if (!cpu_has_tsc)
 		return 0;
 #endif
-	return (cycles_t)__native_read_tsc();
+	rdtsc_barrier();
+	cycles = (cycles_t)__native_read_tsc();
+	rdtsc_barrier();
+
+	return cycles;
 }
 
 extern void tsc_init(void);
commit	0d12cdd5f883f508d33b85c1bae98fa28987c8c7	[log] [tgz]
author	Ingo Molnar <mingo@elte.hu>	Sat Nov 08 16:19:55 2008 +0100
committer	Ingo Molnar <mingo@elte.hu>	Sat Nov 08 16:48:19 2008 +0100
tree	e07bb1f9ef49062fbd9817fe41cab66964bedf03
parent	52c642f33b14bfa1b00ef2b68296effb34a573f3 [diff] [blame]