| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 1 | DCCP protocol | 
|  | 2 | ============ | 
|  | 3 |  | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 4 |  | 
|  | 5 | Contents | 
|  | 6 | ======== | 
|  | 7 |  | 
|  | 8 | - Introduction | 
|  | 9 | - Missing features | 
|  | 10 | - Socket options | 
|  | 11 | - Notes | 
|  | 12 |  | 
|  | 13 | Introduction | 
|  | 14 | ============ | 
|  | 15 |  | 
|  | 16 | Datagram Congestion Control Protocol (DCCP) is an unreliable, connection | 
| Gerrit Renker | e333b3e | 2007-11-21 10:09:56 -0200 | [diff] [blame] | 17 | oriented protocol designed to solve issues present in UDP and TCP, particularly | 
|  | 18 | for real-time and multimedia (streaming) traffic. | 
|  | 19 | It divides into a base protocol (RFC 4340) and plugable congestion control | 
|  | 20 | modules called CCIDs. Like plugable TCP congestion control, at least one CCID | 
|  | 21 | needs to be enabled in order for the protocol to function properly. In the Linux | 
|  | 22 | implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as | 
|  | 23 | the TCP-friendly CCID3 (RFC 4342), are optional. | 
|  | 24 | For a brief introduction to CCIDs and suggestions for choosing a CCID to match | 
|  | 25 | given applications, see section 10 of RFC 4340. | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 26 |  | 
|  | 27 | It has a base protocol and pluggable congestion control IDs (CCIDs). | 
|  | 28 |  | 
| Gerrit Renker | ebe6f7e | 2007-11-21 10:00:17 -0200 | [diff] [blame] | 29 | DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol | 
|  | 30 | is at http://www.ietf.org/html.charters/dccp-charter.html | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 31 |  | 
|  | 32 | Missing features | 
|  | 33 | ================ | 
|  | 34 |  | 
| Gerrit Renker | ebe6f7e | 2007-11-21 10:00:17 -0200 | [diff] [blame] | 35 | The Linux DCCP implementation does not currently support all the features that are | 
|  | 36 | specified in RFCs 4340...42. | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 37 |  | 
| Ian McDonald | ddfe10b | 2006-11-20 18:42:45 -0200 | [diff] [blame] | 38 | The known bugs are at: | 
|  | 39 | http://linux-net.osdl.org/index.php/TODO#DCCP | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 40 |  | 
| Gerrit Renker | ebe6f7e | 2007-11-21 10:00:17 -0200 | [diff] [blame] | 41 | For more up-to-date versions of the DCCP implementation, please consider using | 
|  | 42 | the experimental DCCP test tree; instructions for checking this out are on: | 
|  | 43 | http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree | 
|  | 44 |  | 
|  | 45 |  | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 46 | Socket options | 
|  | 47 | ============== | 
|  | 48 |  | 
| Gerrit Renker | 00e4d11 | 2006-09-22 09:33:58 +0100 | [diff] [blame] | 49 | DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of | 
|  | 50 | service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, | 
|  | 51 | the socket will fall back to 0 (which means that no meaningful service code | 
| Gerrit Renker | 126acd5 | 2007-10-04 14:40:22 -0700 | [diff] [blame] | 52 | is present). On active sockets this is set before connect(); specifying more | 
|  | 53 | than one code has no effect (all subsequent service codes are ignored). The | 
|  | 54 | case is different for passive sockets, where multiple service codes (up to 32) | 
|  | 55 | can be set before calling bind(). | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 56 |  | 
| Gerrit Renker | 7c559a9 | 2007-10-04 14:39:22 -0700 | [diff] [blame] | 57 | DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet | 
|  | 58 | size (application payload size) in bytes, see RFC 4340, section 14. | 
|  | 59 |  | 
| Gerrit Renker | b8599d2 | 2007-12-13 12:25:01 -0200 | [diff] [blame] | 60 | DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold | 
|  | 61 | timewait state when closing the connection (RFC 4340, 8.3). The usual case is | 
|  | 62 | that the closing server sends a CloseReq, whereupon the client holds timewait | 
|  | 63 | state. When this boolean socket option is on, the server sends a Close instead | 
|  | 64 | and will enter TIMEWAIT. This option must be set after accept() returns. | 
|  | 65 |  | 
| Gerrit Renker | 6f4e5ff | 2006-11-10 17:43:06 -0200 | [diff] [blame] | 66 | DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the | 
|  | 67 | partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums | 
|  | 68 | always cover the entire packet and that only fully covered application data is | 
|  | 69 | accepted by the receiver. Hence, when using this feature on the sender, it must | 
|  | 70 | be enabled at the receiver, too with suitable choice of CsCov. | 
|  | 71 |  | 
|  | 72 | DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the | 
|  | 73 | range 0..15 are acceptable. The default setting is 0 (full coverage), | 
|  | 74 | values between 1..15 indicate partial coverage. | 
| Gerrit Renker | 2bfd754 | 2007-10-04 14:50:57 -0700 | [diff] [blame] | 75 | DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it | 
| Gerrit Renker | 6f4e5ff | 2006-11-10 17:43:06 -0200 | [diff] [blame] | 76 | sets a threshold, where again values 0..15 are acceptable. The default | 
|  | 77 | of 0 means that all packets with a partial coverage will be discarded. | 
|  | 78 | Values in the range 1..15 indicate that packets with minimally such a | 
|  | 79 | coverage value are also acceptable. The higher the number, the more | 
| Gerrit Renker | 2bfd754 | 2007-10-04 14:50:57 -0700 | [diff] [blame] | 80 | restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage | 
|  | 81 | settings are inherited to the child socket after accept(). | 
| Gerrit Renker | 6f4e5ff | 2006-11-10 17:43:06 -0200 | [diff] [blame] | 82 |  | 
| Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 83 | The following two options apply to CCID 3 exclusively and are getsockopt()-only. | 
|  | 84 | In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. | 
|  | 85 | DCCP_SOCKOPT_CCID_RX_INFO | 
|  | 86 | Returns a `struct tfrc_rx_info' in optval; the buffer for optval and | 
|  | 87 | optlen must be set to at least sizeof(struct tfrc_rx_info). | 
|  | 88 | DCCP_SOCKOPT_CCID_TX_INFO | 
|  | 89 | Returns a `struct tfrc_tx_info' in optval; the buffer for optval and | 
|  | 90 | optlen must be set to at least sizeof(struct tfrc_tx_info). | 
|  | 91 |  | 
| Gerrit Renker | 8e8c71f | 2007-11-21 09:56:48 -0200 | [diff] [blame] | 92 | On unidirectional connections it is useful to close the unused half-connection | 
|  | 93 | via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. | 
| Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 94 |  | 
| Gerrit Renker | 2e2e9e9 | 2006-11-13 13:23:52 -0200 | [diff] [blame] | 95 | Sysctl variables | 
|  | 96 | ================ | 
|  | 97 | Several DCCP default parameters can be managed by the following sysctls | 
|  | 98 | (sysctl net.dccp.default or /proc/sys/net/dccp/default): | 
|  | 99 |  | 
|  | 100 | request_retries | 
|  | 101 | The number of active connection initiation retries (the number of | 
|  | 102 | Requests minus one) before timing out. In addition, it also governs | 
|  | 103 | the behaviour of the other, passive side: this variable also sets | 
|  | 104 | the number of times DCCP repeats sending a Response when the initial | 
|  | 105 | handshake does not progress from RESPOND to OPEN (i.e. when no Ack | 
|  | 106 | is received after the initial Request).  This value should be greater | 
|  | 107 | than 0, suggested is less than 10. Analogue of tcp_syn_retries. | 
|  | 108 |  | 
|  | 109 | retries1 | 
|  | 110 | How often a DCCP Response is retransmitted until the listening DCCP | 
|  | 111 | side considers its connecting peer dead. Analogue of tcp_retries1. | 
|  | 112 |  | 
|  | 113 | retries2 | 
|  | 114 | The number of times a general DCCP packet is retransmitted. This has | 
|  | 115 | importance for retransmitted acknowledgments and feature negotiation, | 
|  | 116 | data packets are never retransmitted. Analogue of tcp_retries2. | 
|  | 117 |  | 
|  | 118 | send_ndp = 1 | 
|  | 119 | Whether or not to send NDP count options (sec. 7.7.2). | 
|  | 120 |  | 
|  | 121 | send_ackvec = 1 | 
|  | 122 | Whether or not to send Ack Vector options (sec. 11.5). | 
|  | 123 |  | 
|  | 124 | ack_ratio = 2 | 
|  | 125 | The default Ack Ratio (sec. 11.3) to use. | 
|  | 126 |  | 
|  | 127 | tx_ccid = 2 | 
|  | 128 | Default CCID for the sender-receiver half-connection. | 
|  | 129 |  | 
|  | 130 | rx_ccid = 2 | 
|  | 131 | Default CCID for the receiver-sender half-connection. | 
|  | 132 |  | 
|  | 133 | seq_window = 100 | 
|  | 134 | The initial sequence window (sec. 7.5.2). | 
|  | 135 |  | 
| Ian McDonald | 82e3ab9 | 2006-11-20 19:19:32 -0200 | [diff] [blame] | 136 | tx_qlen = 5 | 
|  | 137 | The size of the transmit buffer in packets. A value of 0 corresponds | 
|  | 138 | to an unbounded transmit buffer. | 
|  | 139 |  | 
| Gerrit Renker | a94f0f9 | 2007-09-26 11:31:49 -0300 | [diff] [blame] | 140 | sync_ratelimit = 125 ms | 
|  | 141 | The timeout between subsequent DCCP-Sync packets sent in response to | 
|  | 142 | sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit | 
|  | 143 | of this parameter is milliseconds; a value of 0 disables rate-limiting. | 
|  | 144 |  | 
| Gerrit Renker | c281490 | 2007-11-21 10:14:31 -0200 | [diff] [blame] | 145 | IOCTLS | 
|  | 146 | ====== | 
|  | 147 | FIONREAD | 
|  | 148 | Works as in udp(7): returns in the `int' argument pointer the size of | 
|  | 149 | the next pending datagram in bytes, or 0 when no datagram is pending. | 
|  | 150 |  | 
| Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 151 | Notes | 
|  | 152 | ===== | 
|  | 153 |  | 
| Ian McDonald | ddfe10b | 2006-11-20 18:42:45 -0200 | [diff] [blame] | 154 | DCCP does not travel through NAT successfully at present on many boxes. This is | 
| Gerrit Renker | 126acd5 | 2007-10-04 14:40:22 -0700 | [diff] [blame] | 155 | because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT | 
| Ian McDonald | ddfe10b | 2006-11-20 18:42:45 -0200 | [diff] [blame] | 156 | support for DCCP has been added. |