drm/radeon/kms: Schedule host path read cache flush through the ring V2

R300 family will hard lockup if host path read cache flush is
done through MMIO to HOST_PATH_CNTL. But scheduling same flush
through ring seems harmless. This patch remove the hdp_flush
callback and add a flush after each fence emission which means
a flush after each IB schedule. Thus we should have same behavior
without the hard lockup.

Tested on R100,R200,R300,R400,R500,R600,R700 family.

V2: Adjust fence counts in r600_blit_prepare_copy()

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
index 3f2cc9e..b8623b7 100644
--- a/drivers/gpu/drm/radeon/r300.c
+++ b/drivers/gpu/drm/radeon/r300.c
@@ -36,7 +36,15 @@
 #include "rv350d.h"
 #include "r300_reg_safe.h"
 
-/* This files gather functions specifics to: r300,r350,rv350,rv370,rv380 */
+/* This files gather functions specifics to: r300,r350,rv350,rv370,rv380
+ *
+ * GPU Errata:
+ * - HOST_PATH_CNTL: r300 family seems to dislike write to HOST_PATH_CNTL
+ *   using MMIO to flush host path read cache, this lead to HARDLOCKUP.
+ *   However, scheduling such write to the ring seems harmless, i suspect
+ *   the CP read collide with the flush somehow, or maybe the MC, hard to
+ *   tell. (Jerome Glisse)
+ */
 
 /*
  * rv370,rv380 PCIE GART
@@ -178,6 +186,11 @@
 	/* Wait until IDLE & CLEAN */
 	radeon_ring_write(rdev, PACKET0(0x1720, 0));
 	radeon_ring_write(rdev, (1 << 17) | (1 << 16)  | (1 << 9));
+	radeon_ring_write(rdev, PACKET0(RADEON_HOST_PATH_CNTL, 0));
+	radeon_ring_write(rdev, rdev->config.r300.hdp_cntl |
+				RADEON_HDP_READ_BUFFER_INVALIDATE);
+	radeon_ring_write(rdev, PACKET0(RADEON_HOST_PATH_CNTL, 0));
+	radeon_ring_write(rdev, rdev->config.r300.hdp_cntl);
 	/* Emit fence sequence & fire IRQ */
 	radeon_ring_write(rdev, PACKET0(rdev->fence_drv.scratch_reg, 0));
 	radeon_ring_write(rdev, fence->seq);
@@ -1258,6 +1271,7 @@
 	}
 	/* Enable IRQ */
 	r100_irq_set(rdev);
+	rdev->config.r300.hdp_cntl = RREG32(RADEON_HOST_PATH_CNTL);
 	/* 1M ring buffer */
 	r = r100_cp_init(rdev, 1024 * 1024);
 	if (r) {