drm/radeon/kms: Schedule host path read cache flush through the ring V2

R300 family will hard lockup if host path read cache flush is
done through MMIO to HOST_PATH_CNTL. But scheduling same flush
through ring seems harmless. This patch remove the hdp_flush
callback and add a flush after each fence emission which means
a flush after each IB schedule. Thus we should have same behavior
without the hard lockup.

Tested on R100,R200,R300,R400,R500,R600,R700 family.

V2: Adjust fence counts in r600_blit_prepare_copy()

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 921926f..e2f43c1 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1388,11 +1388,6 @@
 	(void)RREG32(PCIE_PORT_DATA);
 }
 
-void r600_hdp_flush(struct radeon_device *rdev)
-{
-	WREG32(R_005480_HDP_MEM_COHERENCY_FLUSH_CNTL, 0x1);
-}
-
 /*
  * CP & Ring
  */
@@ -1789,6 +1784,8 @@
 	radeon_ring_write(rdev, PACKET3(PACKET3_SET_CONFIG_REG, 1));
 	radeon_ring_write(rdev, ((rdev->fence_drv.scratch_reg - PACKET3_SET_CONFIG_REG_OFFSET) >> 2));
 	radeon_ring_write(rdev, fence->seq);
+	radeon_ring_write(rdev, PACKET0(R_005480_HDP_MEM_COHERENCY_FLUSH_CNTL, 0));
+	radeon_ring_write(rdev, 1);
 	/* CP_INTERRUPT packet 3 no longer exists, use packet 0 */
 	radeon_ring_write(rdev, PACKET0(CP_INT_STATUS, 0));
 	radeon_ring_write(rdev, RB_INT_STAT);