mac80211: use call_rcu() on sta deletion

mac80211 calls synchronize_rcu() on sta deletion,
which increase the roaming time significantly.

Convert it into a call_rcu() mechanism, in order
to avoid blocking. Since some of the cleanup
functions might sleep, schedule from the call_rcu
callback a new work that will do the actual cleanup.

In order to make sure the cleanup occurs before
the interface went down, flush local->workqueue
on ieee80211_do_stop().

Signed-off-by: Yoni Divinsky <yoni.divinsky@ti.com>
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index d747da5..6f8a73c 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -793,11 +793,20 @@
 		flush_work(&sdata->work);
 		/*
 		 * When we get here, the interface is marked down.
-		 * Call synchronize_rcu() to wait for the RX path
+		 * Call rcu_barrier() to wait both for the RX path
 		 * should it be using the interface and enqueuing
-		 * frames at this very time on another CPU.
+		 * frames at this very time on another CPU, and
+		 * for the sta free call_rcu callbacks.
 		 */
-		synchronize_rcu();
+		rcu_barrier();
+
+		/*
+		 * free_sta_rcu() enqueues a work for the actual
+		 * sta cleanup, so we need to flush it while
+		 * sdata is still valid.
+		 */
+		flush_workqueue(local->workqueue);
+
 		skb_queue_purge(&sdata->skb_queue);
 
 		/*