ceph: reset osd after relevant messages timed out
This simplifies the process of timing out messages. We
keep lru of current messages that are in flight. If a
timeout has passed, we reset the osd connection, so that
messages will be retransmitted. This is a failsafe in case
we hit some sort of problem sending out message to the OSD.
Normally, we'll get notification via an updated osdmap if
there are problems.
If a request is older than the keepalive timeout, send a
keepalive to ensure we detect any breaks in the TCP connection.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 6a778f2..02c0ddc 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -62,6 +62,7 @@
int max_readdir; /* max readdir size */
int congestion_kb; /* max readdir size */
int osd_timeout;
+ int osd_keepalive_timeout;
char *snapdir_name; /* default ".snap" */
char *name;
char *secret;
@@ -72,6 +73,8 @@
* defaults
*/
#define CEPH_MOUNT_TIMEOUT_DEFAULT 60
+#define CEPH_OSD_TIMEOUT_DEFAULT 60 /* seconds */
+#define CEPH_OSD_KEEPALIVE_DEFAULT 5
#define CEPH_OSD_IDLE_TTL_DEFAULT 60
#define CEPH_MOUNT_RSIZE_DEFAULT (512*1024) /* readahead */