| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 1 | An introduction to the videobuf layer | 
 | 2 | Jonathan Corbet <corbet@lwn.net> | 
 | 3 | Current as of 2.6.33 | 
 | 4 |  | 
 | 5 | The videobuf layer functions as a sort of glue layer between a V4L2 driver | 
 | 6 | and user space.  It handles the allocation and management of buffers for | 
 | 7 | the storage of video frames.  There is a set of functions which can be used | 
 | 8 | to implement many of the standard POSIX I/O system calls, including read(), | 
 | 9 | poll(), and, happily, mmap().  Another set of functions can be used to | 
 | 10 | implement the bulk of the V4L2 ioctl() calls related to streaming I/O, | 
 | 11 | including buffer allocation, queueing and dequeueing, and streaming | 
 | 12 | control.  Using videobuf imposes a few design decisions on the driver | 
 | 13 | author, but the payback comes in the form of reduced code in the driver and | 
 | 14 | a consistent implementation of the V4L2 user-space API. | 
 | 15 |  | 
 | 16 | Buffer types | 
 | 17 |  | 
 | 18 | Not all video devices use the same kind of buffers.  In fact, there are (at | 
 | 19 | least) three common variations: | 
 | 20 |  | 
 | 21 |  - Buffers which are scattered in both the physical and (kernel) virtual | 
 | 22 |    address spaces.  (Almost) all user-space buffers are like this, but it | 
 | 23 |    makes great sense to allocate kernel-space buffers this way as well when | 
 | 24 |    it is possible.  Unfortunately, it is not always possible; working with | 
 | 25 |    this kind of buffer normally requires hardware which can do | 
 | 26 |    scatter/gather DMA operations. | 
 | 27 |  | 
 | 28 |  - Buffers which are physically scattered, but which are virtually | 
 | 29 |    contiguous; buffers allocated with vmalloc(), in other words.  These | 
 | 30 |    buffers are just as hard to use for DMA operations, but they can be | 
 | 31 |    useful in situations where DMA is not available but virtually-contiguous | 
 | 32 |    buffers are convenient. | 
 | 33 |  | 
 | 34 |  - Buffers which are physically contiguous.  Allocation of this kind of | 
 | 35 |    buffer can be unreliable on fragmented systems, but simpler DMA | 
 | 36 |    controllers cannot deal with anything else. | 
 | 37 |  | 
 | 38 | Videobuf can work with all three types of buffers, but the driver author | 
 | 39 | must pick one at the outset and design the driver around that decision. | 
 | 40 |  | 
 | 41 | [It's worth noting that there's a fourth kind of buffer: "overlay" buffers | 
 | 42 | which are located within the system's video memory.  The overlay | 
 | 43 | functionality is considered to be deprecated for most use, but it still | 
 | 44 | shows up occasionally in system-on-chip drivers where the performance | 
 | 45 | benefits merit the use of this technique.  Overlay buffers can be handled | 
 | 46 | as a form of scattered buffer, but there are very few implementations in | 
 | 47 | the kernel and a description of this technique is currently beyond the | 
 | 48 | scope of this document.] | 
 | 49 |  | 
 | 50 | Data structures, callbacks, and initialization | 
 | 51 |  | 
 | 52 | Depending on which type of buffers are being used, the driver should | 
 | 53 | include one of the following files: | 
 | 54 |  | 
 | 55 |     <media/videobuf-dma-sg.h>		/* Physically scattered */ | 
 | 56 |     <media/videobuf-vmalloc.h>		/* vmalloc() buffers	*/ | 
 | 57 |     <media/videobuf-dma-contig.h>	/* Physically contiguous */ | 
 | 58 |  | 
 | 59 | The driver's data structure describing a V4L2 device should include a | 
 | 60 | struct videobuf_queue instance for the management of the buffer queue, | 
 | 61 | along with a list_head for the queue of available buffers.  There will also | 
 | 62 | need to be an interrupt-safe spinlock which is used to protect (at least) | 
 | 63 | the queue. | 
 | 64 |  | 
 | 65 | The next step is to write four simple callbacks to help videobuf deal with | 
 | 66 | the management of buffers: | 
 | 67 |  | 
 | 68 |     struct videobuf_queue_ops { | 
 | 69 | 	int (*buf_setup)(struct videobuf_queue *q, | 
 | 70 | 			 unsigned int *count, unsigned int *size); | 
 | 71 | 	int (*buf_prepare)(struct videobuf_queue *q, | 
 | 72 | 			   struct videobuf_buffer *vb, | 
 | 73 | 			   enum v4l2_field field); | 
 | 74 | 	void (*buf_queue)(struct videobuf_queue *q, | 
 | 75 | 			  struct videobuf_buffer *vb); | 
 | 76 | 	void (*buf_release)(struct videobuf_queue *q, | 
 | 77 | 			    struct videobuf_buffer *vb); | 
 | 78 |     }; | 
 | 79 |  | 
 | 80 | buf_setup() is called early in the I/O process, when streaming is being | 
 | 81 | initiated; its purpose is to tell videobuf about the I/O stream.  The count | 
 | 82 | parameter will be a suggested number of buffers to use; the driver should | 
 | 83 | check it for rationality and adjust it if need be.  As a practical rule, a | 
 | 84 | minimum of two buffers are needed for proper streaming, and there is | 
 | 85 | usually a maximum (which cannot exceed 32) which makes sense for each | 
 | 86 | device.  The size parameter should be set to the expected (maximum) size | 
 | 87 | for each frame of data. | 
 | 88 |  | 
 | 89 | Each buffer (in the form of a struct videobuf_buffer pointer) will be | 
 | 90 | passed to buf_prepare(), which should set the buffer's size, width, height, | 
 | 91 | and field fields properly.  If the buffer's state field is | 
 | 92 | VIDEOBUF_NEEDS_INIT, the driver should pass it to: | 
 | 93 |  | 
 | 94 |     int videobuf_iolock(struct videobuf_queue* q, struct videobuf_buffer *vb, | 
 | 95 | 			struct v4l2_framebuffer *fbuf); | 
 | 96 |  | 
 | 97 | Among other things, this call will usually allocate memory for the buffer. | 
 | 98 | Finally, the buf_prepare() function should set the buffer's state to | 
 | 99 | VIDEOBUF_PREPARED. | 
 | 100 |  | 
 | 101 | When a buffer is queued for I/O, it is passed to buf_queue(), which should | 
 | 102 | put it onto the driver's list of available buffers and set its state to | 
 | 103 | VIDEOBUF_QUEUED.  Note that this function is called with the queue spinlock | 
 | 104 | held; if it tries to acquire it as well things will come to a screeching | 
 | 105 | halt.  Yes, this is the voice of experience.  Note also that videobuf may | 
 | 106 | wait on the first buffer in the queue; placing other buffers in front of it | 
 | 107 | could again gum up the works.  So use list_add_tail() to enqueue buffers. | 
 | 108 |  | 
 | 109 | Finally, buf_release() is called when a buffer is no longer intended to be | 
 | 110 | used.  The driver should ensure that there is no I/O active on the buffer, | 
 | 111 | then pass it to the appropriate free routine(s): | 
 | 112 |  | 
 | 113 |     /* Scatter/gather drivers */ | 
 | 114 |     int videobuf_dma_unmap(struct videobuf_queue *q, | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 115 | 			   struct videobuf_dmabuf *dma); | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 116 |     int videobuf_dma_free(struct videobuf_dmabuf *dma); | 
 | 117 |  | 
 | 118 |     /* vmalloc drivers */ | 
 | 119 |     void videobuf_vmalloc_free (struct videobuf_buffer *buf); | 
 | 120 |  | 
 | 121 |     /* Contiguous drivers */ | 
 | 122 |     void videobuf_dma_contig_free(struct videobuf_queue *q, | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 123 | 				  struct videobuf_buffer *buf); | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 124 |  | 
 | 125 | One way to ensure that a buffer is no longer under I/O is to pass it to: | 
 | 126 |  | 
 | 127 |     int videobuf_waiton(struct videobuf_buffer *vb, int non_blocking, int intr); | 
 | 128 |  | 
 | 129 | Here, vb is the buffer, non_blocking indicates whether non-blocking I/O | 
 | 130 | should be used (it should be zero in the buf_release() case), and intr | 
 | 131 | controls whether an interruptible wait is used. | 
 | 132 |  | 
 | 133 | File operations | 
 | 134 |  | 
 | 135 | At this point, much of the work is done; much of the rest is slipping | 
 | 136 | videobuf calls into the implementation of the other driver callbacks.  The | 
 | 137 | first step is in the open() function, which must initialize the | 
 | 138 | videobuf queue.  The function to use depends on the type of buffer used: | 
 | 139 |  | 
 | 140 |     void videobuf_queue_sg_init(struct videobuf_queue *q, | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 141 | 				struct videobuf_queue_ops *ops, | 
 | 142 | 				struct device *dev, | 
 | 143 | 				spinlock_t *irqlock, | 
 | 144 | 				enum v4l2_buf_type type, | 
 | 145 | 				enum v4l2_field field, | 
 | 146 | 				unsigned int msize, | 
 | 147 | 				void *priv); | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 148 |  | 
 | 149 |     void videobuf_queue_vmalloc_init(struct videobuf_queue *q, | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 150 | 				struct videobuf_queue_ops *ops, | 
 | 151 | 				struct device *dev, | 
 | 152 | 				spinlock_t *irqlock, | 
 | 153 | 				enum v4l2_buf_type type, | 
 | 154 | 				enum v4l2_field field, | 
 | 155 | 				unsigned int msize, | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 156 | 				void *priv); | 
 | 157 |  | 
 | 158 |     void videobuf_queue_dma_contig_init(struct videobuf_queue *q, | 
 | 159 | 				       struct videobuf_queue_ops *ops, | 
 | 160 | 				       struct device *dev, | 
 | 161 | 				       spinlock_t *irqlock, | 
 | 162 | 				       enum v4l2_buf_type type, | 
 | 163 | 				       enum v4l2_field field, | 
 | 164 | 				       unsigned int msize, | 
 | 165 | 				       void *priv); | 
 | 166 |  | 
 | 167 | In each case, the parameters are the same: q is the queue structure for the | 
 | 168 | device, ops is the set of callbacks as described above, dev is the device | 
 | 169 | structure for this video device, irqlock is an interrupt-safe spinlock to | 
 | 170 | protect access to the data structures, type is the buffer type used by the | 
 | 171 | device (cameras will use V4L2_BUF_TYPE_VIDEO_CAPTURE, for example), field | 
 | 172 | describes which field is being captured (often V4L2_FIELD_NONE for | 
 | 173 | progressive devices), msize is the size of any containing structure used | 
 | 174 | around struct videobuf_buffer, and priv is a private data pointer which | 
 | 175 | shows up in the priv_data field of struct videobuf_queue.  Note that these | 
 | 176 | are void functions which, evidently, are immune to failure. | 
 | 177 |  | 
 | 178 | V4L2 capture drivers can be written to support either of two APIs: the | 
 | 179 | read() system call and the rather more complicated streaming mechanism.  As | 
 | 180 | a general rule, it is necessary to support both to ensure that all | 
 | 181 | applications have a chance of working with the device.  Videobuf makes it | 
 | 182 | easy to do that with the same code.  To implement read(), the driver need | 
 | 183 | only make a call to one of: | 
 | 184 |  | 
 | 185 |     ssize_t videobuf_read_one(struct videobuf_queue *q, | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 186 | 			      char __user *data, size_t count, | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 187 | 			      loff_t *ppos, int nonblocking); | 
 | 188 |  | 
 | 189 |     ssize_t videobuf_read_stream(struct videobuf_queue *q, | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 190 | 				 char __user *data, size_t count, | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 191 | 				 loff_t *ppos, int vbihack, int nonblocking); | 
 | 192 |  | 
 | 193 | Either one of these functions will read frame data into data, returning the | 
 | 194 | amount actually read; the difference is that videobuf_read_one() will only | 
 | 195 | read a single frame, while videobuf_read_stream() will read multiple frames | 
 | 196 | if they are needed to satisfy the count requested by the application.  A | 
 | 197 | typical driver read() implementation will start the capture engine, call | 
 | 198 | one of the above functions, then stop the engine before returning (though a | 
 | 199 | smarter implementation might leave the engine running for a little while in | 
 | 200 | anticipation of another read() call happening in the near future). | 
 | 201 |  | 
 | 202 | The poll() function can usually be implemented with a direct call to: | 
 | 203 |  | 
 | 204 |     unsigned int videobuf_poll_stream(struct file *file, | 
 | 205 | 				      struct videobuf_queue *q, | 
 | 206 | 				      poll_table *wait); | 
 | 207 |  | 
 | 208 | Note that the actual wait queue eventually used will be the one associated | 
 | 209 | with the first available buffer. | 
 | 210 |  | 
 | 211 | When streaming I/O is done to kernel-space buffers, the driver must support | 
 | 212 | the mmap() system call to enable user space to access the data.  In many | 
 | 213 | V4L2 drivers, the often-complex mmap() implementation simplifies to a | 
 | 214 | single call to: | 
 | 215 |  | 
 | 216 |     int videobuf_mmap_mapper(struct videobuf_queue *q, | 
 | 217 | 			     struct vm_area_struct *vma); | 
 | 218 |  | 
 | 219 | Everything else is handled by the videobuf code. | 
 | 220 |  | 
 | 221 | The release() function requires two separate videobuf calls: | 
 | 222 |  | 
 | 223 |     void videobuf_stop(struct videobuf_queue *q); | 
 | 224 |     int videobuf_mmap_free(struct videobuf_queue *q); | 
 | 225 |  | 
 | 226 | The call to videobuf_stop() terminates any I/O in progress - though it is | 
 | 227 | still up to the driver to stop the capture engine.  The call to | 
 | 228 | videobuf_mmap_free() will ensure that all buffers have been unmapped; if | 
 | 229 | so, they will all be passed to the buf_release() callback.  If buffers | 
 | 230 | remain mapped, videobuf_mmap_free() returns an error code instead.  The | 
 | 231 | purpose is clearly to cause the closing of the file descriptor to fail if | 
 | 232 | buffers are still mapped, but every driver in the 2.6.32 kernel cheerfully | 
 | 233 | ignores its return value. | 
 | 234 |  | 
 | 235 | ioctl() operations | 
 | 236 |  | 
 | 237 | The V4L2 API includes a very long list of driver callbacks to respond to | 
 | 238 | the many ioctl() commands made available to user space.  A number of these | 
 | 239 | - those associated with streaming I/O - turn almost directly into videobuf | 
 | 240 | calls.  The relevant helper functions are: | 
 | 241 |  | 
 | 242 |     int videobuf_reqbufs(struct videobuf_queue *q, | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 243 | 			 struct v4l2_requestbuffers *req); | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 244 |     int videobuf_querybuf(struct videobuf_queue *q, struct v4l2_buffer *b); | 
 | 245 |     int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b); | 
| Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 246 |     int videobuf_dqbuf(struct videobuf_queue *q, struct v4l2_buffer *b, | 
 | 247 | 		       int nonblocking); | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 248 |     int videobuf_streamon(struct videobuf_queue *q); | 
 | 249 |     int videobuf_streamoff(struct videobuf_queue *q); | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 250 |  | 
 | 251 | So, for example, a VIDIOC_REQBUFS call turns into a call to the driver's | 
 | 252 | vidioc_reqbufs() callback which, in turn, usually only needs to locate the | 
 | 253 | proper struct videobuf_queue pointer and pass it to videobuf_reqbufs(). | 
 | 254 | These support functions can replace a great deal of buffer management | 
 | 255 | boilerplate in a lot of V4L2 drivers. | 
 | 256 |  | 
 | 257 | The vidioc_streamon() and vidioc_streamoff() functions will be a bit more | 
 | 258 | complex, of course, since they will also need to deal with starting and | 
| Hans Verkuil | e4ea644 | 2010-12-25 07:15:22 -0300 | [diff] [blame] | 259 | stopping the capture engine. | 
| Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 260 |  | 
 | 261 | Buffer allocation | 
 | 262 |  | 
 | 263 | Thus far, we have talked about buffers, but have not looked at how they are | 
 | 264 | allocated.  The scatter/gather case is the most complex on this front.  For | 
 | 265 | allocation, the driver can leave buffer allocation entirely up to the | 
 | 266 | videobuf layer; in this case, buffers will be allocated as anonymous | 
 | 267 | user-space pages and will be very scattered indeed.  If the application is | 
 | 268 | using user-space buffers, no allocation is needed; the videobuf layer will | 
 | 269 | take care of calling get_user_pages() and filling in the scatterlist array. | 
 | 270 |  | 
 | 271 | If the driver needs to do its own memory allocation, it should be done in | 
 | 272 | the vidioc_reqbufs() function, *after* calling videobuf_reqbufs().  The | 
 | 273 | first step is a call to: | 
 | 274 |  | 
 | 275 |     struct videobuf_dmabuf *videobuf_to_dma(struct videobuf_buffer *buf); | 
 | 276 |  | 
 | 277 | The returned videobuf_dmabuf structure (defined in | 
 | 278 | <media/videobuf-dma-sg.h>) includes a couple of relevant fields: | 
 | 279 |  | 
 | 280 |     struct scatterlist  *sglist; | 
 | 281 |     int                 sglen; | 
 | 282 |  | 
 | 283 | The driver must allocate an appropriately-sized scatterlist array and | 
 | 284 | populate it with pointers to the pieces of the allocated buffer; sglen | 
 | 285 | should be set to the length of the array. | 
 | 286 |  | 
 | 287 | Drivers using the vmalloc() method need not (and cannot) concern themselves | 
 | 288 | with buffer allocation at all; videobuf will handle those details.  The | 
 | 289 | same is normally true of contiguous-DMA drivers as well; videobuf will | 
 | 290 | allocate the buffers (with dma_alloc_coherent()) when it sees fit.  That | 
 | 291 | means that these drivers may be trying to do high-order allocations at any | 
 | 292 | time, an operation which is not always guaranteed to work.  Some drivers | 
 | 293 | play tricks by allocating DMA space at system boot time; videobuf does not | 
 | 294 | currently play well with those drivers. | 
 | 295 |  | 
 | 296 | As of 2.6.31, contiguous-DMA drivers can work with a user-supplied buffer, | 
 | 297 | as long as that buffer is physically contiguous.  Normal user-space | 
 | 298 | allocations will not meet that criterion, but buffers obtained from other | 
 | 299 | kernel drivers, or those contained within huge pages, will work with these | 
 | 300 | drivers. | 
 | 301 |  | 
 | 302 | Filling the buffers | 
 | 303 |  | 
 | 304 | The final part of a videobuf implementation has no direct callback - it's | 
 | 305 | the portion of the code which actually puts frame data into the buffers, | 
 | 306 | usually in response to interrupts from the device.  For all types of | 
 | 307 | drivers, this process works approximately as follows: | 
 | 308 |  | 
 | 309 |  - Obtain the next available buffer and make sure that somebody is actually | 
 | 310 |    waiting for it. | 
 | 311 |  | 
 | 312 |  - Get a pointer to the memory and put video data there. | 
 | 313 |  | 
 | 314 |  - Mark the buffer as done and wake up the process waiting for it. | 
 | 315 |  | 
 | 316 | Step (1) above is done by looking at the driver-managed list_head structure | 
 | 317 | - the one which is filled in the buf_queue() callback.  Because starting | 
 | 318 | the engine and enqueueing buffers are done in separate steps, it's possible | 
 | 319 | for the engine to be running without any buffers available - in the | 
 | 320 | vmalloc() case especially.  So the driver should be prepared for the list | 
 | 321 | to be empty.  It is equally possible that nobody is yet interested in the | 
 | 322 | buffer; the driver should not remove it from the list or fill it until a | 
 | 323 | process is waiting on it.  That test can be done by examining the buffer's | 
 | 324 | done field (a wait_queue_head_t structure) with waitqueue_active(). | 
 | 325 |  | 
 | 326 | A buffer's state should be set to VIDEOBUF_ACTIVE before being mapped for | 
 | 327 | DMA; that ensures that the videobuf layer will not try to do anything with | 
 | 328 | it while the device is transferring data. | 
 | 329 |  | 
 | 330 | For scatter/gather drivers, the needed memory pointers will be found in the | 
 | 331 | scatterlist structure described above.  Drivers using the vmalloc() method | 
 | 332 | can get a memory pointer with: | 
 | 333 |  | 
 | 334 |     void *videobuf_to_vmalloc(struct videobuf_buffer *buf); | 
 | 335 |  | 
 | 336 | For contiguous DMA drivers, the function to use is: | 
 | 337 |  | 
 | 338 |     dma_addr_t videobuf_to_dma_contig(struct videobuf_buffer *buf); | 
 | 339 |  | 
 | 340 | The contiguous DMA API goes out of its way to hide the kernel-space address | 
 | 341 | of the DMA buffer from drivers. | 
 | 342 |  | 
 | 343 | The final step is to set the size field of the relevant videobuf_buffer | 
 | 344 | structure to the actual size of the captured image, set state to | 
 | 345 | VIDEOBUF_DONE, then call wake_up() on the done queue.  At this point, the | 
 | 346 | buffer is owned by the videobuf layer and the driver should not touch it | 
 | 347 | again. | 
 | 348 |  | 
 | 349 | Developers who are interested in more information can go into the relevant | 
 | 350 | header files; there are a few low-level functions declared there which have | 
 | 351 | not been talked about here.  Also worthwhile is the vivi driver | 
 | 352 | (drivers/media/video/vivi.c), which is maintained as an example of how V4L2 | 
 | 353 | drivers should be written.  Vivi only uses the vmalloc() API, but it's good | 
 | 354 | enough to get started with.  Note also that all of these calls are exported | 
 | 355 | GPL-only, so they will not be available to non-GPL kernel modules. |