| David Howells | 17926a7 | 2007-04-26 15:48:28 -0700 | [diff] [blame] | 1 | ====================== | 
|  | 2 | RxRPC NETWORK PROTOCOL | 
|  | 3 | ====================== | 
|  | 4 |  | 
|  | 5 | The RxRPC protocol driver provides a reliable two-phase transport on top of UDP | 
|  | 6 | that can be used to perform RxRPC remote operations.  This is done over sockets | 
|  | 7 | of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and | 
|  | 8 | receive data, aborts and errors. | 
|  | 9 |  | 
|  | 10 | Contents of this document: | 
|  | 11 |  | 
|  | 12 | (*) Overview. | 
|  | 13 |  | 
|  | 14 | (*) RxRPC protocol summary. | 
|  | 15 |  | 
|  | 16 | (*) AF_RXRPC driver model. | 
|  | 17 |  | 
|  | 18 | (*) Control messages. | 
|  | 19 |  | 
|  | 20 | (*) Socket options. | 
|  | 21 |  | 
|  | 22 | (*) Security. | 
|  | 23 |  | 
|  | 24 | (*) Example client usage. | 
|  | 25 |  | 
|  | 26 | (*) Example server usage. | 
|  | 27 |  | 
| David Howells | 651350d | 2007-04-26 15:50:17 -0700 | [diff] [blame] | 28 | (*) AF_RXRPC kernel interface. | 
|  | 29 |  | 
| David Howells | 17926a7 | 2007-04-26 15:48:28 -0700 | [diff] [blame] | 30 |  | 
|  | 31 | ======== | 
|  | 32 | OVERVIEW | 
|  | 33 | ======== | 
|  | 34 |  | 
|  | 35 | RxRPC is a two-layer protocol.  There is a session layer which provides | 
|  | 36 | reliable virtual connections using UDP over IPv4 (or IPv6) as the transport | 
|  | 37 | layer, but implements a real network protocol; and there's the presentation | 
|  | 38 | layer which renders structured data to binary blobs and back again using XDR | 
|  | 39 | (as does SunRPC): | 
|  | 40 |  | 
|  | 41 | +-------------+ | 
|  | 42 | | Application | | 
|  | 43 | +-------------+ | 
|  | 44 | |     XDR     |		Presentation | 
|  | 45 | +-------------+ | 
|  | 46 | |    RxRPC    |		Session | 
|  | 47 | +-------------+ | 
|  | 48 | |     UDP     |		Transport | 
|  | 49 | +-------------+ | 
|  | 50 |  | 
|  | 51 |  | 
|  | 52 | AF_RXRPC provides: | 
|  | 53 |  | 
|  | 54 | (1) Part of an RxRPC facility for both kernel and userspace applications by | 
|  | 55 | making the session part of it a Linux network protocol (AF_RXRPC). | 
|  | 56 |  | 
|  | 57 | (2) A two-phase protocol.  The client transmits a blob (the request) and then | 
|  | 58 | receives a blob (the reply), and the server receives the request and then | 
|  | 59 | transmits the reply. | 
|  | 60 |  | 
|  | 61 | (3) Retention of the reusable bits of the transport system set up for one call | 
|  | 62 | to speed up subsequent calls. | 
|  | 63 |  | 
|  | 64 | (4) A secure protocol, using the Linux kernel's key retention facility to | 
|  | 65 | manage security on the client end.  The server end must of necessity be | 
|  | 66 | more active in security negotiations. | 
|  | 67 |  | 
|  | 68 | AF_RXRPC does not provide XDR marshalling/presentation facilities.  That is | 
|  | 69 | left to the application.  AF_RXRPC only deals in blobs.  Even the operation ID | 
|  | 70 | is just the first four bytes of the request blob, and as such is beyond the | 
|  | 71 | kernel's interest. | 
|  | 72 |  | 
|  | 73 |  | 
|  | 74 | Sockets of AF_RXRPC family are: | 
|  | 75 |  | 
|  | 76 | (1) created as type SOCK_DGRAM; | 
|  | 77 |  | 
|  | 78 | (2) provided with a protocol of the type of underlying transport they're going | 
|  | 79 | to use - currently only PF_INET is supported. | 
|  | 80 |  | 
|  | 81 |  | 
|  | 82 | The Andrew File System (AFS) is an example of an application that uses this and | 
|  | 83 | that has both kernel (filesystem) and userspace (utility) components. | 
|  | 84 |  | 
|  | 85 |  | 
|  | 86 | ====================== | 
|  | 87 | RXRPC PROTOCOL SUMMARY | 
|  | 88 | ====================== | 
|  | 89 |  | 
|  | 90 | An overview of the RxRPC protocol: | 
|  | 91 |  | 
|  | 92 | (*) RxRPC sits on top of another networking protocol (UDP is the only option | 
|  | 93 | currently), and uses this to provide network transport.  UDP ports, for | 
|  | 94 | example, provide transport endpoints. | 
|  | 95 |  | 
|  | 96 | (*) RxRPC supports multiple virtual "connections" from any given transport | 
|  | 97 | endpoint, thus allowing the endpoints to be shared, even to the same | 
|  | 98 | remote endpoint. | 
|  | 99 |  | 
|  | 100 | (*) Each connection goes to a particular "service".  A connection may not go | 
|  | 101 | to multiple services.  A service may be considered the RxRPC equivalent of | 
|  | 102 | a port number.  AF_RXRPC permits multiple services to share an endpoint. | 
|  | 103 |  | 
|  | 104 | (*) Client-originating packets are marked, thus a transport endpoint can be | 
|  | 105 | shared between client and server connections (connections have a | 
|  | 106 | direction). | 
|  | 107 |  | 
|  | 108 | (*) Up to a billion connections may be supported concurrently between one | 
|  | 109 | local transport endpoint and one service on one remote endpoint.  An RxRPC | 
|  | 110 | connection is described by seven numbers: | 
|  | 111 |  | 
|  | 112 | Local address	} | 
|  | 113 | Local port	} Transport (UDP) address | 
|  | 114 | Remote address	} | 
|  | 115 | Remote port	} | 
|  | 116 | Direction | 
|  | 117 | Connection ID | 
|  | 118 | Service ID | 
|  | 119 |  | 
|  | 120 | (*) Each RxRPC operation is a "call".  A connection may make up to four | 
|  | 121 | billion calls, but only up to four calls may be in progress on a | 
|  | 122 | connection at any one time. | 
|  | 123 |  | 
|  | 124 | (*) Calls are two-phase and asymmetric: the client sends its request data, | 
|  | 125 | which the service receives; then the service transmits the reply data | 
|  | 126 | which the client receives. | 
|  | 127 |  | 
|  | 128 | (*) The data blobs are of indefinite size, the end of a phase is marked with a | 
|  | 129 | flag in the packet.  The number of packets of data making up one blob may | 
|  | 130 | not exceed 4 billion, however, as this would cause the sequence number to | 
|  | 131 | wrap. | 
|  | 132 |  | 
|  | 133 | (*) The first four bytes of the request data are the service operation ID. | 
|  | 134 |  | 
|  | 135 | (*) Security is negotiated on a per-connection basis.  The connection is | 
|  | 136 | initiated by the first data packet on it arriving.  If security is | 
|  | 137 | requested, the server then issues a "challenge" and then the client | 
|  | 138 | replies with a "response".  If the response is successful, the security is | 
|  | 139 | set for the lifetime of that connection, and all subsequent calls made | 
|  | 140 | upon it use that same security.  In the event that the server lets a | 
|  | 141 | connection lapse before the client, the security will be renegotiated if | 
|  | 142 | the client uses the connection again. | 
|  | 143 |  | 
|  | 144 | (*) Calls use ACK packets to handle reliability.  Data packets are also | 
|  | 145 | explicitly sequenced per call. | 
|  | 146 |  | 
|  | 147 | (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs. | 
|  | 148 | A hard-ACK indicates to the far side that all the data received to a point | 
|  | 149 | has been received and processed; a soft-ACK indicates that the data has | 
|  | 150 | been received but may yet be discarded and re-requested.  The sender may | 
|  | 151 | not discard any transmittable packets until they've been hard-ACK'd. | 
|  | 152 |  | 
|  | 153 | (*) Reception of a reply data packet implicitly hard-ACK's all the data | 
|  | 154 | packets that make up the request. | 
|  | 155 |  | 
|  | 156 | (*) An call is complete when the request has been sent, the reply has been | 
|  | 157 | received and the final hard-ACK on the last packet of the reply has | 
|  | 158 | reached the server. | 
|  | 159 |  | 
|  | 160 | (*) An call may be aborted by either end at any time up to its completion. | 
|  | 161 |  | 
|  | 162 |  | 
|  | 163 | ===================== | 
|  | 164 | AF_RXRPC DRIVER MODEL | 
|  | 165 | ===================== | 
|  | 166 |  | 
|  | 167 | About the AF_RXRPC driver: | 
|  | 168 |  | 
|  | 169 | (*) The AF_RXRPC protocol transparently uses internal sockets of the transport | 
|  | 170 | protocol to represent transport endpoints. | 
|  | 171 |  | 
|  | 172 | (*) AF_RXRPC sockets map onto RxRPC connection bundles.  Actual RxRPC | 
|  | 173 | connections are handled transparently.  One client socket may be used to | 
|  | 174 | make multiple simultaneous calls to the same service.  One server socket | 
|  | 175 | may handle calls from many clients. | 
|  | 176 |  | 
|  | 177 | (*) Additional parallel client connections will be initiated to support extra | 
|  | 178 | concurrent calls, up to a tunable limit. | 
|  | 179 |  | 
|  | 180 | (*) Each connection is retained for a certain amount of time [tunable] after | 
|  | 181 | the last call currently using it has completed in case a new call is made | 
|  | 182 | that could reuse it. | 
|  | 183 |  | 
|  | 184 | (*) Each internal UDP socket is retained [tunable] for a certain amount of | 
|  | 185 | time [tunable] after the last connection using it discarded, in case a new | 
|  | 186 | connection is made that could use it. | 
|  | 187 |  | 
|  | 188 | (*) A client-side connection is only shared between calls if they have have | 
|  | 189 | the same key struct describing their security (and assuming the calls | 
|  | 190 | would otherwise share the connection).  Non-secured calls would also be | 
|  | 191 | able to share connections with each other. | 
|  | 192 |  | 
|  | 193 | (*) A server-side connection is shared if the client says it is. | 
|  | 194 |  | 
|  | 195 | (*) ACK'ing is handled by the protocol driver automatically, including ping | 
|  | 196 | replying. | 
|  | 197 |  | 
|  | 198 | (*) SO_KEEPALIVE automatically pings the other side to keep the connection | 
|  | 199 | alive [TODO]. | 
|  | 200 |  | 
|  | 201 | (*) If an ICMP error is received, all calls affected by that error will be | 
|  | 202 | aborted with an appropriate network error passed through recvmsg(). | 
|  | 203 |  | 
|  | 204 |  | 
|  | 205 | Interaction with the user of the RxRPC socket: | 
|  | 206 |  | 
|  | 207 | (*) A socket is made into a server socket by binding an address with a | 
|  | 208 | non-zero service ID. | 
|  | 209 |  | 
|  | 210 | (*) In the client, sending a request is achieved with one or more sendmsgs, | 
|  | 211 | followed by the reply being received with one or more recvmsgs. | 
|  | 212 |  | 
|  | 213 | (*) The first sendmsg for a request to be sent from a client contains a tag to | 
|  | 214 | be used in all other sendmsgs or recvmsgs associated with that call.  The | 
|  | 215 | tag is carried in the control data. | 
|  | 216 |  | 
|  | 217 | (*) connect() is used to supply a default destination address for a client | 
|  | 218 | socket.  This may be overridden by supplying an alternate address to the | 
|  | 219 | first sendmsg() of a call (struct msghdr::msg_name). | 
|  | 220 |  | 
|  | 221 | (*) If connect() is called on an unbound client, a random local port will | 
|  | 222 | bound before the operation takes place. | 
|  | 223 |  | 
|  | 224 | (*) A server socket may also be used to make client calls.  To do this, the | 
|  | 225 | first sendmsg() of the call must specify the target address.  The server's | 
|  | 226 | transport endpoint is used to send the packets. | 
|  | 227 |  | 
|  | 228 | (*) Once the application has received the last message associated with a call, | 
|  | 229 | the tag is guaranteed not to be seen again, and so it can be used to pin | 
|  | 230 | client resources.  A new call can then be initiated with the same tag | 
|  | 231 | without fear of interference. | 
|  | 232 |  | 
|  | 233 | (*) In the server, a request is received with one or more recvmsgs, then the | 
|  | 234 | the reply is transmitted with one or more sendmsgs, and then the final ACK | 
|  | 235 | is received with a last recvmsg. | 
|  | 236 |  | 
|  | 237 | (*) When sending data for a call, sendmsg is given MSG_MORE if there's more | 
|  | 238 | data to come on that call. | 
|  | 239 |  | 
|  | 240 | (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more | 
|  | 241 | data to come for that call. | 
|  | 242 |  | 
|  | 243 | (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg | 
|  | 244 | to indicate the terminal message for that call. | 
|  | 245 |  | 
|  | 246 | (*) A call may be aborted by adding an abort control message to the control | 
|  | 247 | data.  Issuing an abort terminates the kernel's use of that call's tag. | 
|  | 248 | Any messages waiting in the receive queue for that call will be discarded. | 
|  | 249 |  | 
|  | 250 | (*) Aborts, busy notifications and challenge packets are delivered by recvmsg, | 
|  | 251 | and control data messages will be set to indicate the context.  Receiving | 
|  | 252 | an abort or a busy message terminates the kernel's use of that call's tag. | 
|  | 253 |  | 
|  | 254 | (*) The control data part of the msghdr struct is used for a number of things: | 
|  | 255 |  | 
|  | 256 | (*) The tag of the intended or affected call. | 
|  | 257 |  | 
|  | 258 | (*) Sending or receiving errors, aborts and busy notifications. | 
|  | 259 |  | 
|  | 260 | (*) Notifications of incoming calls. | 
|  | 261 |  | 
|  | 262 | (*) Sending debug requests and receiving debug replies [TODO]. | 
|  | 263 |  | 
|  | 264 | (*) When the kernel has received and set up an incoming call, it sends a | 
|  | 265 | message to server application to let it know there's a new call awaiting | 
|  | 266 | its acceptance [recvmsg reports a special control message].  The server | 
|  | 267 | application then uses sendmsg to assign a tag to the new call.  Once that | 
|  | 268 | is done, the first part of the request data will be delivered by recvmsg. | 
|  | 269 |  | 
|  | 270 | (*) The server application has to provide the server socket with a keyring of | 
|  | 271 | secret keys corresponding to the security types it permits.  When a secure | 
|  | 272 | connection is being set up, the kernel looks up the appropriate secret key | 
|  | 273 | in the keyring and then sends a challenge packet to the client and | 
|  | 274 | receives a response packet.  The kernel then checks the authorisation of | 
|  | 275 | the packet and either aborts the connection or sets up the security. | 
|  | 276 |  | 
|  | 277 | (*) The name of the key a client will use to secure its communications is | 
|  | 278 | nominated by a socket option. | 
|  | 279 |  | 
|  | 280 |  | 
|  | 281 | Notes on recvmsg: | 
|  | 282 |  | 
|  | 283 | (*) If there's a sequence of data messages belonging to a particular call on | 
|  | 284 | the receive queue, then recvmsg will keep working through them until: | 
|  | 285 |  | 
|  | 286 | (a) it meets the end of that call's received data, | 
|  | 287 |  | 
|  | 288 | (b) it meets a non-data message, | 
|  | 289 |  | 
|  | 290 | (c) it meets a message belonging to a different call, or | 
|  | 291 |  | 
|  | 292 | (d) it fills the user buffer. | 
|  | 293 |  | 
|  | 294 | If recvmsg is called in blocking mode, it will keep sleeping, awaiting the | 
|  | 295 | reception of further data, until one of the above four conditions is met. | 
|  | 296 |  | 
|  | 297 | (2) MSG_PEEK operates similarly, but will return immediately if it has put any | 
|  | 298 | data in the buffer rather than sleeping until it can fill the buffer. | 
|  | 299 |  | 
|  | 300 | (3) If a data message is only partially consumed in filling a user buffer, | 
|  | 301 | then the remainder of that message will be left on the front of the queue | 
|  | 302 | for the next taker.  MSG_TRUNC will never be flagged. | 
|  | 303 |  | 
|  | 304 | (4) If there is more data to be had on a call (it hasn't copied the last byte | 
|  | 305 | of the last data message in that phase yet), then MSG_MORE will be | 
|  | 306 | flagged. | 
|  | 307 |  | 
|  | 308 |  | 
|  | 309 | ================ | 
|  | 310 | CONTROL MESSAGES | 
|  | 311 | ================ | 
|  | 312 |  | 
|  | 313 | AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex | 
|  | 314 | calls, to invoke certain actions and to report certain conditions.  These are: | 
|  | 315 |  | 
|  | 316 | MESSAGE ID		SRT DATA	MEANING | 
|  | 317 | =======================	=== ===========	=============================== | 
|  | 318 | RXRPC_USER_CALL_ID	sr- User ID	App's call specifier | 
|  | 319 | RXRPC_ABORT		srt Abort code	Abort code to issue/received | 
|  | 320 | RXRPC_ACK		-rt n/a		Final ACK received | 
|  | 321 | RXRPC_NET_ERROR		-rt error num	Network error on call | 
|  | 322 | RXRPC_BUSY		-rt n/a		Call rejected (server busy) | 
|  | 323 | RXRPC_LOCAL_ERROR	-rt error num	Local error encountered | 
|  | 324 | RXRPC_NEW_CALL		-r- n/a		New call received | 
|  | 325 | RXRPC_ACCEPT		s-- n/a		Accept new call | 
|  | 326 |  | 
|  | 327 | (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message) | 
|  | 328 |  | 
|  | 329 | (*) RXRPC_USER_CALL_ID | 
|  | 330 |  | 
|  | 331 | This is used to indicate the application's call ID.  It's an unsigned long | 
|  | 332 | that the app specifies in the client by attaching it to the first data | 
|  | 333 | message or in the server by passing it in association with an RXRPC_ACCEPT | 
|  | 334 | message.  recvmsg() passes it in conjunction with all messages except | 
|  | 335 | those of the RXRPC_NEW_CALL message. | 
|  | 336 |  | 
|  | 337 | (*) RXRPC_ABORT | 
|  | 338 |  | 
|  | 339 | This is can be used by an application to abort a call by passing it to | 
|  | 340 | sendmsg, or it can be delivered by recvmsg to indicate a remote abort was | 
|  | 341 | received.  Either way, it must be associated with an RXRPC_USER_CALL_ID to | 
|  | 342 | specify the call affected.  If an abort is being sent, then error EBADSLT | 
|  | 343 | will be returned if there is no call with that user ID. | 
|  | 344 |  | 
|  | 345 | (*) RXRPC_ACK | 
|  | 346 |  | 
|  | 347 | This is delivered to a server application to indicate that the final ACK | 
|  | 348 | of a call was received from the client.  It will be associated with an | 
|  | 349 | RXRPC_USER_CALL_ID to indicate the call that's now complete. | 
|  | 350 |  | 
|  | 351 | (*) RXRPC_NET_ERROR | 
|  | 352 |  | 
|  | 353 | This is delivered to an application to indicate that an ICMP error message | 
|  | 354 | was encountered in the process of trying to talk to the peer.  An | 
|  | 355 | errno-class integer value will be included in the control message data | 
|  | 356 | indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call | 
|  | 357 | affected. | 
|  | 358 |  | 
|  | 359 | (*) RXRPC_BUSY | 
|  | 360 |  | 
|  | 361 | This is delivered to a client application to indicate that a call was | 
|  | 362 | rejected by the server due to the server being busy.  It will be | 
|  | 363 | associated with an RXRPC_USER_CALL_ID to indicate the rejected call. | 
|  | 364 |  | 
|  | 365 | (*) RXRPC_LOCAL_ERROR | 
|  | 366 |  | 
|  | 367 | This is delivered to an application to indicate that a local error was | 
|  | 368 | encountered and that a call has been aborted because of it.  An | 
|  | 369 | errno-class integer value will be included in the control message data | 
|  | 370 | indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call | 
|  | 371 | affected. | 
|  | 372 |  | 
|  | 373 | (*) RXRPC_NEW_CALL | 
|  | 374 |  | 
|  | 375 | This is delivered to indicate to a server application that a new call has | 
|  | 376 | arrived and is awaiting acceptance.  No user ID is associated with this, | 
|  | 377 | as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT. | 
|  | 378 |  | 
|  | 379 | (*) RXRPC_ACCEPT | 
|  | 380 |  | 
|  | 381 | This is used by a server application to attempt to accept a call and | 
|  | 382 | assign it a user ID.  It should be associated with an RXRPC_USER_CALL_ID | 
|  | 383 | to indicate the user ID to be assigned.  If there is no call to be | 
|  | 384 | accepted (it may have timed out, been aborted, etc.), then sendmsg will | 
|  | 385 | return error ENODATA.  If the user ID is already in use by another call, | 
|  | 386 | then error EBADSLT will be returned. | 
|  | 387 |  | 
|  | 388 |  | 
|  | 389 | ============== | 
|  | 390 | SOCKET OPTIONS | 
|  | 391 | ============== | 
|  | 392 |  | 
|  | 393 | AF_RXRPC sockets support a few socket options at the SOL_RXRPC level: | 
|  | 394 |  | 
|  | 395 | (*) RXRPC_SECURITY_KEY | 
|  | 396 |  | 
|  | 397 | This is used to specify the description of the key to be used.  The key is | 
|  | 398 | extracted from the calling process's keyrings with request_key() and | 
|  | 399 | should be of "rxrpc" type. | 
|  | 400 |  | 
|  | 401 | The optval pointer points to the description string, and optlen indicates | 
|  | 402 | how long the string is, without the NUL terminator. | 
|  | 403 |  | 
|  | 404 | (*) RXRPC_SECURITY_KEYRING | 
|  | 405 |  | 
|  | 406 | Similar to above but specifies a keyring of server secret keys to use (key | 
|  | 407 | type "keyring").  See the "Security" section. | 
|  | 408 |  | 
|  | 409 | (*) RXRPC_EXCLUSIVE_CONNECTION | 
|  | 410 |  | 
|  | 411 | This is used to request that new connections should be used for each call | 
|  | 412 | made subsequently on this socket.  optval should be NULL and optlen 0. | 
|  | 413 |  | 
|  | 414 | (*) RXRPC_MIN_SECURITY_LEVEL | 
|  | 415 |  | 
|  | 416 | This is used to specify the minimum security level required for calls on | 
|  | 417 | this socket.  optval must point to an int containing one of the following | 
|  | 418 | values: | 
|  | 419 |  | 
|  | 420 | (a) RXRPC_SECURITY_PLAIN | 
|  | 421 |  | 
|  | 422 | Encrypted checksum only. | 
|  | 423 |  | 
|  | 424 | (b) RXRPC_SECURITY_AUTH | 
|  | 425 |  | 
|  | 426 | Encrypted checksum plus packet padded and first eight bytes of packet | 
|  | 427 | encrypted - which includes the actual packet length. | 
|  | 428 |  | 
|  | 429 | (c) RXRPC_SECURITY_ENCRYPTED | 
|  | 430 |  | 
|  | 431 | Encrypted checksum plus entire packet padded and encrypted, including | 
|  | 432 | actual packet length. | 
|  | 433 |  | 
|  | 434 |  | 
|  | 435 | ======== | 
|  | 436 | SECURITY | 
|  | 437 | ======== | 
|  | 438 |  | 
|  | 439 | Currently, only the kerberos 4 equivalent protocol has been implemented | 
|  | 440 | (security index 2 - rxkad).  This requires the rxkad module to be loaded and, | 
|  | 441 | on the client, tickets of the appropriate type to be obtained from the AFS | 
|  | 442 | kaserver or the kerberos server and installed as "rxrpc" type keys.  This is | 
|  | 443 | normally done using the klog program.  An example simple klog program can be | 
|  | 444 | found at: | 
|  | 445 |  | 
|  | 446 | http://people.redhat.com/~dhowells/rxrpc/klog.c | 
|  | 447 |  | 
|  | 448 | The payload provided to add_key() on the client should be of the following | 
|  | 449 | form: | 
|  | 450 |  | 
|  | 451 | struct rxrpc_key_sec2_v1 { | 
|  | 452 | uint16_t	security_index;	/* 2 */ | 
|  | 453 | uint16_t	ticket_length;	/* length of ticket[] */ | 
|  | 454 | uint32_t	expiry;		/* time at which expires */ | 
|  | 455 | uint8_t		kvno;		/* key version number */ | 
|  | 456 | uint8_t		__pad[3]; | 
|  | 457 | uint8_t		session_key[8];	/* DES session key */ | 
|  | 458 | uint8_t		ticket[0];	/* the encrypted ticket */ | 
|  | 459 | }; | 
|  | 460 |  | 
|  | 461 | Where the ticket blob is just appended to the above structure. | 
|  | 462 |  | 
|  | 463 |  | 
|  | 464 | For the server, keys of type "rxrpc_s" must be made available to the server. | 
|  | 465 | They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an | 
|  | 466 | rxkad key for the AFS VL service).  When such a key is created, it should be | 
|  | 467 | given the server's secret key as the instantiation data (see the example | 
|  | 468 | below). | 
|  | 469 |  | 
|  | 470 | add_key("rxrpc_s", "52:2", secret_key, 8, keyring); | 
|  | 471 |  | 
|  | 472 | A keyring is passed to the server socket by naming it in a sockopt.  The server | 
|  | 473 | socket then looks the server secret keys up in this keyring when secure | 
|  | 474 | incoming connections are made.  This can be seen in an example program that can | 
|  | 475 | be found at: | 
|  | 476 |  | 
|  | 477 | http://people.redhat.com/~dhowells/rxrpc/listen.c | 
|  | 478 |  | 
|  | 479 |  | 
|  | 480 | ==================== | 
|  | 481 | EXAMPLE CLIENT USAGE | 
|  | 482 | ==================== | 
|  | 483 |  | 
|  | 484 | A client would issue an operation by: | 
|  | 485 |  | 
|  | 486 | (1) An RxRPC socket is set up by: | 
|  | 487 |  | 
|  | 488 | client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); | 
|  | 489 |  | 
|  | 490 | Where the third parameter indicates the protocol family of the transport | 
|  | 491 | socket used - usually IPv4 but it can also be IPv6 [TODO]. | 
|  | 492 |  | 
|  | 493 | (2) A local address can optionally be bound: | 
|  | 494 |  | 
|  | 495 | struct sockaddr_rxrpc srx = { | 
|  | 496 | .srx_family	= AF_RXRPC, | 
|  | 497 | .srx_service	= 0,  /* we're a client */ | 
|  | 498 | .transport_type	= SOCK_DGRAM,	/* type of transport socket */ | 
|  | 499 | .transport.sin_family	= AF_INET, | 
|  | 500 | .transport.sin_port	= htons(7000), /* AFS callback */ | 
|  | 501 | .transport.sin_address	= 0,  /* all local interfaces */ | 
|  | 502 | }; | 
|  | 503 | bind(client, &srx, sizeof(srx)); | 
|  | 504 |  | 
|  | 505 | This specifies the local UDP port to be used.  If not given, a random | 
|  | 506 | non-privileged port will be used.  A UDP port may be shared between | 
|  | 507 | several unrelated RxRPC sockets.  Security is handled on a basis of | 
|  | 508 | per-RxRPC virtual connection. | 
|  | 509 |  | 
|  | 510 | (3) The security is set: | 
|  | 511 |  | 
|  | 512 | const char *key = "AFS:cambridge.redhat.com"; | 
|  | 513 | setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key)); | 
|  | 514 |  | 
|  | 515 | This issues a request_key() to get the key representing the security | 
|  | 516 | context.  The minimum security level can be set: | 
|  | 517 |  | 
|  | 518 | unsigned int sec = RXRPC_SECURITY_ENCRYPTED; | 
|  | 519 | setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL, | 
|  | 520 | &sec, sizeof(sec)); | 
|  | 521 |  | 
|  | 522 | (4) The server to be contacted can then be specified (alternatively this can | 
|  | 523 | be done through sendmsg): | 
|  | 524 |  | 
|  | 525 | struct sockaddr_rxrpc srx = { | 
|  | 526 | .srx_family	= AF_RXRPC, | 
|  | 527 | .srx_service	= VL_SERVICE_ID, | 
|  | 528 | .transport_type	= SOCK_DGRAM,	/* type of transport socket */ | 
|  | 529 | .transport.sin_family	= AF_INET, | 
|  | 530 | .transport.sin_port	= htons(7005), /* AFS volume manager */ | 
|  | 531 | .transport.sin_address	= ..., | 
|  | 532 | }; | 
|  | 533 | connect(client, &srx, sizeof(srx)); | 
|  | 534 |  | 
|  | 535 | (5) The request data should then be posted to the server socket using a series | 
|  | 536 | of sendmsg() calls, each with the following control message attached: | 
|  | 537 |  | 
|  | 538 | RXRPC_USER_CALL_ID	- specifies the user ID for this call | 
|  | 539 |  | 
|  | 540 | MSG_MORE should be set in msghdr::msg_flags on all but the last part of | 
|  | 541 | the request.  Multiple requests may be made simultaneously. | 
|  | 542 |  | 
|  | 543 | If a call is intended to go to a destination other then the default | 
|  | 544 | specified through connect(), then msghdr::msg_name should be set on the | 
|  | 545 | first request message of that call. | 
|  | 546 |  | 
|  | 547 | (6) The reply data will then be posted to the server socket for recvmsg() to | 
|  | 548 | pick up.  MSG_MORE will be flagged by recvmsg() if there's more reply data | 
|  | 549 | for a particular call to be read.  MSG_EOR will be set on the terminal | 
|  | 550 | read for a call. | 
|  | 551 |  | 
|  | 552 | All data will be delivered with the following control message attached: | 
|  | 553 |  | 
|  | 554 | RXRPC_USER_CALL_ID	- specifies the user ID for this call | 
|  | 555 |  | 
|  | 556 | If an abort or error occurred, this will be returned in the control data | 
|  | 557 | buffer instead, and MSG_EOR will be flagged to indicate the end of that | 
|  | 558 | call. | 
|  | 559 |  | 
|  | 560 |  | 
|  | 561 | ==================== | 
|  | 562 | EXAMPLE SERVER USAGE | 
|  | 563 | ==================== | 
|  | 564 |  | 
|  | 565 | A server would be set up to accept operations in the following manner: | 
|  | 566 |  | 
|  | 567 | (1) An RxRPC socket is created by: | 
|  | 568 |  | 
|  | 569 | server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); | 
|  | 570 |  | 
|  | 571 | Where the third parameter indicates the address type of the transport | 
|  | 572 | socket used - usually IPv4. | 
|  | 573 |  | 
|  | 574 | (2) Security is set up if desired by giving the socket a keyring with server | 
|  | 575 | secret keys in it: | 
|  | 576 |  | 
|  | 577 | keyring = add_key("keyring", "AFSkeys", NULL, 0, | 
|  | 578 | KEY_SPEC_PROCESS_KEYRING); | 
|  | 579 |  | 
|  | 580 | const char secret_key[8] = { | 
|  | 581 | 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 }; | 
|  | 582 | add_key("rxrpc_s", "52:2", secret_key, 8, keyring); | 
|  | 583 |  | 
|  | 584 | setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7); | 
|  | 585 |  | 
|  | 586 | The keyring can be manipulated after it has been given to the socket. This | 
|  | 587 | permits the server to add more keys, replace keys, etc. whilst it is live. | 
|  | 588 |  | 
|  | 589 | (2) A local address must then be bound: | 
|  | 590 |  | 
|  | 591 | struct sockaddr_rxrpc srx = { | 
|  | 592 | .srx_family	= AF_RXRPC, | 
|  | 593 | .srx_service	= VL_SERVICE_ID, /* RxRPC service ID */ | 
|  | 594 | .transport_type	= SOCK_DGRAM,	/* type of transport socket */ | 
|  | 595 | .transport.sin_family	= AF_INET, | 
|  | 596 | .transport.sin_port	= htons(7000), /* AFS callback */ | 
|  | 597 | .transport.sin_address	= 0,  /* all local interfaces */ | 
|  | 598 | }; | 
|  | 599 | bind(server, &srx, sizeof(srx)); | 
|  | 600 |  | 
|  | 601 | (3) The server is then set to listen out for incoming calls: | 
|  | 602 |  | 
|  | 603 | listen(server, 100); | 
|  | 604 |  | 
|  | 605 | (4) The kernel notifies the server of pending incoming connections by sending | 
|  | 606 | it a message for each.  This is received with recvmsg() on the server | 
|  | 607 | socket.  It has no data, and has a single dataless control message | 
|  | 608 | attached: | 
|  | 609 |  | 
|  | 610 | RXRPC_NEW_CALL | 
|  | 611 |  | 
|  | 612 | The address that can be passed back by recvmsg() at this point should be | 
|  | 613 | ignored since the call for which the message was posted may have gone by | 
|  | 614 | the time it is accepted - in which case the first call still on the queue | 
|  | 615 | will be accepted. | 
|  | 616 |  | 
|  | 617 | (5) The server then accepts the new call by issuing a sendmsg() with two | 
|  | 618 | pieces of control data and no actual data: | 
|  | 619 |  | 
|  | 620 | RXRPC_ACCEPT		- indicate connection acceptance | 
|  | 621 | RXRPC_USER_CALL_ID	- specify user ID for this call | 
|  | 622 |  | 
|  | 623 | (6) The first request data packet will then be posted to the server socket for | 
|  | 624 | recvmsg() to pick up.  At that point, the RxRPC address for the call can | 
|  | 625 | be read from the address fields in the msghdr struct. | 
|  | 626 |  | 
|  | 627 | Subsequent request data will be posted to the server socket for recvmsg() | 
|  | 628 | to collect as it arrives.  All but the last piece of the request data will | 
|  | 629 | be delivered with MSG_MORE flagged. | 
|  | 630 |  | 
|  | 631 | All data will be delivered with the following control message attached: | 
|  | 632 |  | 
|  | 633 | RXRPC_USER_CALL_ID	- specifies the user ID for this call | 
|  | 634 |  | 
|  | 635 | (8) The reply data should then be posted to the server socket using a series | 
|  | 636 | of sendmsg() calls, each with the following control messages attached: | 
|  | 637 |  | 
|  | 638 | RXRPC_USER_CALL_ID	- specifies the user ID for this call | 
|  | 639 |  | 
|  | 640 | MSG_MORE should be set in msghdr::msg_flags on all but the last message | 
|  | 641 | for a particular call. | 
|  | 642 |  | 
|  | 643 | (9) The final ACK from the client will be posted for retrieval by recvmsg() | 
|  | 644 | when it is received.  It will take the form of a dataless message with two | 
|  | 645 | control messages attached: | 
|  | 646 |  | 
|  | 647 | RXRPC_USER_CALL_ID	- specifies the user ID for this call | 
|  | 648 | RXRPC_ACK		- indicates final ACK (no data) | 
|  | 649 |  | 
|  | 650 | MSG_EOR will be flagged to indicate that this is the final message for | 
|  | 651 | this call. | 
|  | 652 |  | 
|  | 653 | (10) Up to the point the final packet of reply data is sent, the call can be | 
|  | 654 | aborted by calling sendmsg() with a dataless message with the following | 
|  | 655 | control messages attached: | 
|  | 656 |  | 
|  | 657 | RXRPC_USER_CALL_ID	- specifies the user ID for this call | 
|  | 658 | RXRPC_ABORT		- indicates abort code (4 byte data) | 
|  | 659 |  | 
|  | 660 | Any packets waiting in the socket's receive queue will be discarded if | 
|  | 661 | this is issued. | 
|  | 662 |  | 
|  | 663 | Note that all the communications for a particular service take place through | 
|  | 664 | the one server socket, using control messages on sendmsg() and recvmsg() to | 
|  | 665 | determine the call affected. | 
| David Howells | 651350d | 2007-04-26 15:50:17 -0700 | [diff] [blame] | 666 |  | 
|  | 667 |  | 
|  | 668 | ========================= | 
|  | 669 | AF_RXRPC KERNEL INTERFACE | 
|  | 670 | ========================= | 
|  | 671 |  | 
|  | 672 | The AF_RXRPC module also provides an interface for use by in-kernel utilities | 
|  | 673 | such as the AFS filesystem.  This permits such a utility to: | 
|  | 674 |  | 
|  | 675 | (1) Use different keys directly on individual client calls on one socket | 
|  | 676 | rather than having to open a whole slew of sockets, one for each key it | 
|  | 677 | might want to use. | 
|  | 678 |  | 
|  | 679 | (2) Avoid having RxRPC call request_key() at the point of issue of a call or | 
|  | 680 | opening of a socket.  Instead the utility is responsible for requesting a | 
|  | 681 | key at the appropriate point.  AFS, for instance, would do this during VFS | 
|  | 682 | operations such as open() or unlink().  The key is then handed through | 
|  | 683 | when the call is initiated. | 
|  | 684 |  | 
|  | 685 | (3) Request the use of something other than GFP_KERNEL to allocate memory. | 
|  | 686 |  | 
|  | 687 | (4) Avoid the overhead of using the recvmsg() call.  RxRPC messages can be | 
|  | 688 | intercepted before they get put into the socket Rx queue and the socket | 
|  | 689 | buffers manipulated directly. | 
|  | 690 |  | 
|  | 691 | To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket, | 
| Matt LaPlante | 01dd2fb | 2007-10-20 01:34:40 +0200 | [diff] [blame] | 692 | bind an address as appropriate and listen if it's to be a server socket, but | 
| David Howells | 651350d | 2007-04-26 15:50:17 -0700 | [diff] [blame] | 693 | then it passes this to the kernel interface functions. | 
|  | 694 |  | 
|  | 695 | The kernel interface functions are as follows: | 
|  | 696 |  | 
|  | 697 | (*) Begin a new client call. | 
|  | 698 |  | 
|  | 699 | struct rxrpc_call * | 
|  | 700 | rxrpc_kernel_begin_call(struct socket *sock, | 
|  | 701 | struct sockaddr_rxrpc *srx, | 
|  | 702 | struct key *key, | 
|  | 703 | unsigned long user_call_ID, | 
|  | 704 | gfp_t gfp); | 
|  | 705 |  | 
|  | 706 | This allocates the infrastructure to make a new RxRPC call and assigns | 
|  | 707 | call and connection numbers.  The call will be made on the UDP port that | 
|  | 708 | the socket is bound to.  The call will go to the destination address of a | 
|  | 709 | connected client socket unless an alternative is supplied (srx is | 
|  | 710 | non-NULL). | 
|  | 711 |  | 
|  | 712 | If a key is supplied then this will be used to secure the call instead of | 
|  | 713 | the key bound to the socket with the RXRPC_SECURITY_KEY sockopt.  Calls | 
|  | 714 | secured in this way will still share connections if at all possible. | 
|  | 715 |  | 
|  | 716 | The user_call_ID is equivalent to that supplied to sendmsg() in the | 
|  | 717 | control data buffer.  It is entirely feasible to use this to point to a | 
|  | 718 | kernel data structure. | 
|  | 719 |  | 
|  | 720 | If this function is successful, an opaque reference to the RxRPC call is | 
|  | 721 | returned.  The caller now holds a reference on this and it must be | 
|  | 722 | properly ended. | 
|  | 723 |  | 
|  | 724 | (*) End a client call. | 
|  | 725 |  | 
|  | 726 | void rxrpc_kernel_end_call(struct rxrpc_call *call); | 
|  | 727 |  | 
|  | 728 | This is used to end a previously begun call.  The user_call_ID is expunged | 
|  | 729 | from AF_RXRPC's knowledge and will not be seen again in association with | 
|  | 730 | the specified call. | 
|  | 731 |  | 
|  | 732 | (*) Send data through a call. | 
|  | 733 |  | 
|  | 734 | int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg, | 
|  | 735 | size_t len); | 
|  | 736 |  | 
|  | 737 | This is used to supply either the request part of a client call or the | 
|  | 738 | reply part of a server call.  msg.msg_iovlen and msg.msg_iov specify the | 
|  | 739 | data buffers to be used.  msg_iov may not be NULL and must point | 
|  | 740 | exclusively to in-kernel virtual addresses.  msg.msg_flags may be given | 
|  | 741 | MSG_MORE if there will be subsequent data sends for this call. | 
|  | 742 |  | 
|  | 743 | The msg must not specify a destination address, control data or any flags | 
|  | 744 | other than MSG_MORE.  len is the total amount of data to transmit. | 
|  | 745 |  | 
|  | 746 | (*) Abort a call. | 
|  | 747 |  | 
|  | 748 | void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code); | 
|  | 749 |  | 
|  | 750 | This is used to abort a call if it's still in an abortable state.  The | 
|  | 751 | abort code specified will be placed in the ABORT message sent. | 
|  | 752 |  | 
|  | 753 | (*) Intercept received RxRPC messages. | 
|  | 754 |  | 
|  | 755 | typedef void (*rxrpc_interceptor_t)(struct sock *sk, | 
|  | 756 | unsigned long user_call_ID, | 
|  | 757 | struct sk_buff *skb); | 
|  | 758 |  | 
|  | 759 | void | 
|  | 760 | rxrpc_kernel_intercept_rx_messages(struct socket *sock, | 
|  | 761 | rxrpc_interceptor_t interceptor); | 
|  | 762 |  | 
|  | 763 | This installs an interceptor function on the specified AF_RXRPC socket. | 
|  | 764 | All messages that would otherwise wind up in the socket's Rx queue are | 
|  | 765 | then diverted to this function.  Note that care must be taken to process | 
|  | 766 | the messages in the right order to maintain DATA message sequentiality. | 
|  | 767 |  | 
|  | 768 | The interceptor function itself is provided with the address of the socket | 
|  | 769 | and handling the incoming message, the ID assigned by the kernel utility | 
|  | 770 | to the call and the socket buffer containing the message. | 
|  | 771 |  | 
|  | 772 | The skb->mark field indicates the type of message: | 
|  | 773 |  | 
|  | 774 | MARK				MEANING | 
|  | 775 | ===============================	======================================= | 
|  | 776 | RXRPC_SKB_MARK_DATA		Data message | 
|  | 777 | RXRPC_SKB_MARK_FINAL_ACK	Final ACK received for an incoming call | 
|  | 778 | RXRPC_SKB_MARK_BUSY		Client call rejected as server busy | 
|  | 779 | RXRPC_SKB_MARK_REMOTE_ABORT	Call aborted by peer | 
|  | 780 | RXRPC_SKB_MARK_NET_ERROR	Network error detected | 
|  | 781 | RXRPC_SKB_MARK_LOCAL_ERROR	Local error encountered | 
|  | 782 | RXRPC_SKB_MARK_NEW_CALL		New incoming call awaiting acceptance | 
|  | 783 |  | 
|  | 784 | The remote abort message can be probed with rxrpc_kernel_get_abort_code(). | 
|  | 785 | The two error messages can be probed with rxrpc_kernel_get_error_number(). | 
|  | 786 | A new call can be accepted with rxrpc_kernel_accept_call(). | 
|  | 787 |  | 
|  | 788 | Data messages can have their contents extracted with the usual bunch of | 
|  | 789 | socket buffer manipulation functions.  A data message can be determined to | 
|  | 790 | be the last one in a sequence with rxrpc_kernel_is_data_last().  When a | 
|  | 791 | data message has been used up, rxrpc_kernel_data_delivered() should be | 
|  | 792 | called on it.. | 
|  | 793 |  | 
|  | 794 | Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose | 
|  | 795 | of.  It is possible to get extra refs on all types of message for later | 
|  | 796 | freeing, but this may pin the state of a call until the message is finally | 
|  | 797 | freed. | 
|  | 798 |  | 
|  | 799 | (*) Accept an incoming call. | 
|  | 800 |  | 
|  | 801 | struct rxrpc_call * | 
|  | 802 | rxrpc_kernel_accept_call(struct socket *sock, | 
|  | 803 | unsigned long user_call_ID); | 
|  | 804 |  | 
|  | 805 | This is used to accept an incoming call and to assign it a call ID.  This | 
|  | 806 | function is similar to rxrpc_kernel_begin_call() and calls accepted must | 
|  | 807 | be ended in the same way. | 
|  | 808 |  | 
|  | 809 | If this function is successful, an opaque reference to the RxRPC call is | 
|  | 810 | returned.  The caller now holds a reference on this and it must be | 
|  | 811 | properly ended. | 
|  | 812 |  | 
|  | 813 | (*) Reject an incoming call. | 
|  | 814 |  | 
|  | 815 | int rxrpc_kernel_reject_call(struct socket *sock); | 
|  | 816 |  | 
|  | 817 | This is used to reject the first incoming call on the socket's queue with | 
|  | 818 | a BUSY message.  -ENODATA is returned if there were no incoming calls. | 
|  | 819 | Other errors may be returned if the call had been aborted (-ECONNABORTED) | 
|  | 820 | or had timed out (-ETIME). | 
|  | 821 |  | 
|  | 822 | (*) Record the delivery of a data message and free it. | 
|  | 823 |  | 
|  | 824 | void rxrpc_kernel_data_delivered(struct sk_buff *skb); | 
|  | 825 |  | 
|  | 826 | This is used to record a data message as having been delivered and to | 
|  | 827 | update the ACK state for the call.  The socket buffer will be freed. | 
|  | 828 |  | 
|  | 829 | (*) Free a message. | 
|  | 830 |  | 
|  | 831 | void rxrpc_kernel_free_skb(struct sk_buff *skb); | 
|  | 832 |  | 
|  | 833 | This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC | 
|  | 834 | socket. | 
|  | 835 |  | 
|  | 836 | (*) Determine if a data message is the last one on a call. | 
|  | 837 |  | 
|  | 838 | bool rxrpc_kernel_is_data_last(struct sk_buff *skb); | 
|  | 839 |  | 
|  | 840 | This is used to determine if a socket buffer holds the last data message | 
|  | 841 | to be received for a call (true will be returned if it does, false | 
|  | 842 | if not). | 
|  | 843 |  | 
|  | 844 | The data message will be part of the reply on a client call and the | 
|  | 845 | request on an incoming call.  In the latter case there will be more | 
|  | 846 | messages, but in the former case there will not. | 
|  | 847 |  | 
|  | 848 | (*) Get the abort code from an abort message. | 
|  | 849 |  | 
|  | 850 | u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb); | 
|  | 851 |  | 
|  | 852 | This is used to extract the abort code from a remote abort message. | 
|  | 853 |  | 
|  | 854 | (*) Get the error number from a local or network error message. | 
|  | 855 |  | 
|  | 856 | int rxrpc_kernel_get_error_number(struct sk_buff *skb); | 
|  | 857 |  | 
|  | 858 | This is used to extract the error number from a message indicating either | 
|  | 859 | a local error occurred or a network error occurred. | 
| David Howells | 76181c1 | 2007-10-16 23:29:46 -0700 | [diff] [blame] | 860 |  | 
|  | 861 | (*) Allocate a null key for doing anonymous security. | 
|  | 862 |  | 
|  | 863 | struct key *rxrpc_get_null_key(const char *keyname); | 
|  | 864 |  | 
|  | 865 | This is used to allocate a null RxRPC key that can be used to indicate | 
|  | 866 | anonymous security for a particular domain. |