Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | Devfs (Device File System) FAQ |
| 2 | |
| 3 | |
| 4 | Linux Devfs (Device File System) FAQ |
| 5 | Richard Gooch |
| 6 | 20-AUG-2002 |
| 7 | |
| 8 | |
| 9 | Document languages: |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | ----------------------------------------------------------------------------- |
| 18 | |
| 19 | NOTE: the master copy of this document is available online at: |
| 20 | |
| 21 | http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html |
| 22 | and looks much better than the text version distributed with the |
| 23 | kernel sources. A mirror site is available at: |
| 24 | |
| 25 | http://www.ras.ucalgary.ca/~rgooch/linux/docs/devfs.html |
| 26 | |
| 27 | There is also an optional daemon that may be used with devfs. You can |
| 28 | find out more about it at: |
| 29 | |
| 30 | http://www.atnf.csiro.au/~rgooch/linux/ |
| 31 | |
| 32 | A mailing list is available which you may subscribe to. Send |
| 33 | email |
| 34 | to majordomo@oss.sgi.com with the following line in the |
| 35 | body of the message: |
| 36 | subscribe devfs |
| 37 | To unsubscribe, send the message body: |
| 38 | unsubscribe devfs |
| 39 | instead. The list is archived at |
| 40 | |
| 41 | http://oss.sgi.com/projects/devfs/archive/. |
| 42 | |
| 43 | ----------------------------------------------------------------------------- |
| 44 | |
| 45 | Contents |
| 46 | |
| 47 | |
| 48 | What is it? |
| 49 | |
| 50 | Why do it? |
| 51 | |
| 52 | Who else does it? |
| 53 | |
| 54 | How it works |
| 55 | |
| 56 | Operational issues (essential reading) |
| 57 | |
| 58 | Instructions for the impatient |
| 59 | Permissions persistence across reboots |
| 60 | Dealing with drivers without devfs support |
| 61 | All the way with Devfs |
| 62 | Other Issues |
| 63 | Kernel Naming Scheme |
| 64 | Devfsd Naming Scheme |
| 65 | Old Compatibility Names |
| 66 | SCSI Host Probing Issues |
| 67 | |
| 68 | |
| 69 | |
| 70 | Device drivers currently ported |
| 71 | |
| 72 | Allocation of Device Numbers |
| 73 | |
| 74 | Questions and Answers |
| 75 | |
| 76 | Making things work |
| 77 | Alternatives to devfs |
| 78 | What I don't like about devfs |
| 79 | How to report bugs |
| 80 | Strange kernel messages |
| 81 | Compilation problems with devfsd |
| 82 | |
| 83 | |
| 84 | Other resources |
| 85 | |
| 86 | Translations of this document |
| 87 | |
| 88 | |
| 89 | ----------------------------------------------------------------------------- |
| 90 | |
| 91 | |
| 92 | What is it? |
| 93 | |
| 94 | Devfs is an alternative to "real" character and block special devices |
| 95 | on your root filesystem. Kernel device drivers can register devices by |
| 96 | name rather than major and minor numbers. These devices will appear in |
| 97 | devfs automatically, with whatever default ownership and |
| 98 | protection the driver specified. A daemon (devfsd) can be used to |
| 99 | override these defaults. Devfs has been in the kernel since 2.3.46. |
| 100 | |
| 101 | NOTE that devfs is entirely optional. If you prefer the old |
| 102 | disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the |
| 103 | default). In this case, nothing will change. ALSO NOTE that if you do |
| 104 | enable devfs, the defaults are such that full compatibility is |
| 105 | maintained with the old devices names. |
| 106 | |
| 107 | There are two aspects to devfs: one is the underlying device |
| 108 | namespace, which is a namespace just like any mounted filesystem. The |
| 109 | other aspect is the filesystem code which provides a view of the |
| 110 | device namespace. The reason I make a distinction is because devfs |
| 111 | can be mounted many times, with each mount showing the same device |
| 112 | namespace. Changes made are global to all mounted devfs filesystems. |
| 113 | Also, because the devfs namespace exists without any devfs mounts, you |
| 114 | can easily mount the root filesystem by referring to an entry in the |
| 115 | devfs namespace. |
| 116 | |
| 117 | |
| 118 | The cost of devfs is a small increase in kernel code size and memory |
| 119 | usage. About 7 pages of code (some of that in __init sections) and 72 |
| 120 | bytes for each entry in the namespace. A modest system has only a |
| 121 | couple of hundred device entries, so this costs a few more |
| 122 | pages. Compare this with the suggestion to put /dev on a <a |
| 123 | href="#why-faq-ramdisc">ramdisc. |
| 124 | |
| 125 | On a typical machine, the cost is under 0.2 percent. On a modest |
| 126 | system with 64 MBytes of RAM, the cost is under 0.1 percent. The |
| 127 | accusations of "bloatware" levelled at devfs are not justified. |
| 128 | |
| 129 | ----------------------------------------------------------------------------- |
| 130 | |
| 131 | |
| 132 | Why do it? |
| 133 | |
| 134 | There are several problems that devfs addresses. Some of these |
| 135 | problems are more serious than others (depending on your point of |
| 136 | view), and some can be solved without devfs. However, the totality of |
| 137 | these problems really calls out for devfs. |
| 138 | |
| 139 | The choice is a patchwork of inefficient user space solutions, which |
| 140 | are complex and likely to be fragile, or to use a simple and efficient |
| 141 | devfs which is robust. |
| 142 | |
| 143 | There have been many counter-proposals to devfs, all seeking to |
| 144 | provide some of the benefits without actually implementing devfs. So |
| 145 | far there has been an absence of code and no proposed alternative has |
| 146 | been able to provide all the features that devfs does. Further, |
| 147 | alternative proposals require far more complexity in user-space (and |
| 148 | still deliver less functionality than devfs). Some people have the |
| 149 | mantra of reducing "kernel bloat", but don't consider the effects on |
| 150 | user-space. |
| 151 | |
| 152 | A good solution limits the total complexity of kernel-space and |
| 153 | user-space. |
| 154 | |
| 155 | |
| 156 | Major&minor allocation |
| 157 | |
| 158 | The existing scheme requires the allocation of major and minor device |
| 159 | numbers for each and every device. This means that a central |
| 160 | co-ordinating authority is required to issue these device numbers |
| 161 | (unless you're developing a "private" device driver), in order to |
| 162 | preserve uniqueness. Devfs shifts the burden to a namespace. This may |
| 163 | not seem like a huge benefit, but actually it is. Since driver authors |
| 164 | will naturally choose a device name which reflects the functionality |
| 165 | of the device, there is far less potential for namespace conflict. |
| 166 | Solving this requires a kernel change. |
| 167 | |
| 168 | /dev management |
| 169 | |
| 170 | Because you currently access devices through device nodes, these must |
| 171 | be created by the system administrator. For standard devices you can |
| 172 | usually find a MAKEDEV programme which creates all these (hundreds!) |
| 173 | of nodes. This means that changes in the kernel must be reflected by |
| 174 | changes in the MAKEDEV programme, or else the system administrator |
| 175 | creates device nodes by hand. |
| 176 | |
| 177 | The basic problem is that there are two separate databases of |
| 178 | major and minor numbers. One is in the kernel and one is in /dev (or |
| 179 | in a MAKEDEV programme, if you want to look at it that way). This is |
| 180 | duplication of information, which is not good practice. |
| 181 | Solving this requires a kernel change. |
| 182 | |
| 183 | /dev growth |
| 184 | |
| 185 | A typical /dev has over 1200 nodes! Most of these devices simply don't |
| 186 | exist because the hardware is not available. A huge /dev increases the |
| 187 | time to access devices (I'm just referring to the dentry lookup times |
| 188 | and the time taken to read inodes off disc: the next subsection shows |
| 189 | some more horrors). |
| 190 | |
| 191 | An example of how big /dev can grow is if we consider SCSI devices: |
| 192 | |
| 193 | host 6 bits (say up to 64 hosts on a really big machine) |
| 194 | channel 4 bits (say up to 16 SCSI buses per host) |
| 195 | id 4 bits |
| 196 | lun 3 bits |
| 197 | partition 6 bits |
| 198 | TOTAL 23 bits |
| 199 | |
| 200 | |
| 201 | This requires 8 Mega (1024*1024) inodes if we want to store all |
| 202 | possible device nodes. Even if we scrap everything but id,partition |
| 203 | and assume a single host adapter with a single SCSI bus and only one |
| 204 | logical unit per SCSI target (id), that's still 10 bits or 1024 |
| 205 | inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so |
| 206 | that's 256 kBytes of inode storage on disc (assuming real inodes take |
| 207 | a similar amount of space as VFS inodes). This is actually not so bad, |
| 208 | because disc is cheap these days. Embedded systems would care about |
| 209 | 256 kBytes of /dev inodes, but you could argue that embedded systems |
| 210 | would have hand-tuned /dev directories. I've had to do just that on my |
| 211 | embedded systems, but I would rather just leave it to devfs. |
| 212 | |
| 213 | Another issue is the time taken to lookup an inode when first |
| 214 | referenced. Not only does this take time in scanning through a list in |
| 215 | memory, but also the seek times to read the inodes off disc. |
| 216 | This could be solved in user-space using a clever programme which |
| 217 | scanned the kernel logs and deleted /dev entries which are not |
| 218 | available and created them when they were available. This programme |
| 219 | would need to be run every time a new module was loaded, which would |
| 220 | slow things down a lot. |
| 221 | |
| 222 | There is an existing programme called scsidev which will automatically |
| 223 | create device nodes for SCSI devices. It can do this by scanning files |
| 224 | in /proc/scsi. Unfortunately, to extend this idea to other device |
| 225 | nodes would require significant modifications to existing drivers (so |
| 226 | they too would provide information in /proc). This is a non-trivial |
| 227 | change (I should know: devfs has had to do something similar). Once |
| 228 | you go to this much effort, you may as well use devfs itself (which |
| 229 | also provides this information). Furthermore, such a system would |
| 230 | likely be implemented in an ad-hoc fashion, as different drivers will |
| 231 | provide their information in different ways. |
| 232 | |
| 233 | Devfs is much cleaner, because it (naturally) has a uniform mechanism |
| 234 | to provide this information: the device nodes themselves! |
| 235 | |
| 236 | |
| 237 | Node to driver file_operations translation |
| 238 | |
| 239 | There is an important difference between the way disc-based character |
| 240 | and block nodes and devfs entries make the connection between an entry |
| 241 | in /dev and the actual device driver. |
| 242 | |
| 243 | With the current 8 bit major and minor numbers the connection between |
| 244 | disc-based c&b nodes and per-major drivers is done through a |
| 245 | fixed-length table of 128 entries. The various filesystem types set |
| 246 | the inode operations for c&b nodes to {chr,blk}dev_inode_operations, |
| 247 | so when a device is opened a few quick levels of indirection bring us |
| 248 | to the driver file_operations. |
| 249 | |
| 250 | For miscellaneous character devices a second step is required: there |
| 251 | is a scan for the driver entry with the same minor number as the file |
| 252 | that was opened, and the appropriate minor open method is called. This |
| 253 | scanning is done *every time* you open a device node. Potentially, you |
| 254 | may be searching through dozens of misc. entries before you find your |
| 255 | open method. While not an enormous performance overhead, this does |
| 256 | seem pointless. |
| 257 | |
| 258 | Linux *must* move beyond the 8 bit major and minor barrier, |
| 259 | somehow. If we simply increase each to 16 bits, then the indexing |
| 260 | scheme used for major driver lookup becomes untenable, because the |
| 261 | major tables (one each for character and block devices) would need to |
| 262 | be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit |
| 263 | systems). So we would have to use a scheme like that used for |
| 264 | miscellaneous character devices, which means the search time goes up |
| 265 | linearly with the average number of major device drivers on your |
| 266 | system. Not all "devices" are hardware, some are higher-level drivers |
| 267 | like KGI, so you can get more "devices" without adding hardware |
| 268 | You can improve this by creating an ordered (balanced:-) |
| 269 | binary tree, in which case your search time becomes log(N). |
| 270 | Alternatively, you can use hashing to speed up the search. |
| 271 | But why do that search at all if you don't have to? Once again, it |
| 272 | seems pointless. |
| 273 | |
| 274 | Note that devfs doesn't use the major&minor system. For devfs |
| 275 | entries, the connection is done when you lookup the /dev entry. When |
| 276 | devfs_register() is called, an internal table is appended which has |
| 277 | the entry name and the file_operations. If the dentry cache doesn't |
| 278 | have the /dev entry already, this internal table is scanned to get the |
| 279 | file_operations, and an inode is created. If the dentry cache already |
| 280 | has the entry, there is *no lookup time* (other than the dentry scan |
| 281 | itself, but we can't avoid that anyway, and besides Linux dentries |
| 282 | cream other OS's which don't have them:-). Furthermore, the number of |
| 283 | node entries in a devfs is only the number of available device |
| 284 | entries, not the number of *conceivable* entries. Even if you remove |
| 285 | unnecessary entries in a disc-based /dev, the number of conceivable |
| 286 | entries remains the same: you just limit yourself in order to save |
| 287 | space. |
| 288 | |
| 289 | Devfs provides a fast connection between a VFS node and the device |
| 290 | driver, in a scalable way. |
| 291 | |
| 292 | /dev as a system administration tool |
| 293 | |
| 294 | Right now /dev contains a list of conceivable devices, most of which I |
| 295 | don't have. Devfs only shows those devices available on my |
| 296 | system. This means that listing /dev is a handy way of checking what |
| 297 | devices are available. |
| 298 | |
| 299 | Major&minor size |
| 300 | |
| 301 | Existing major and minor numbers are limited to 8 bits each. This is |
| 302 | now a limiting factor for some drivers, particularly the SCSI disc |
| 303 | driver, which consumes a single major number. Only 16 discs are |
| 304 | supported, and each disc may have only 15 partitions. Maybe this isn't |
| 305 | a problem for you, but some of us are building huge Linux systems with |
| 306 | disc arrays. With devfs an arbitrary pointer can be associated with |
| 307 | each device entry, which can be used to give an effective 32 bit |
| 308 | device identifier (i.e. that's like having a 32 bit minor |
| 309 | number). Since this is private to the kernel, there are no C library |
| 310 | compatibility issues which you would have with increasing major and |
| 311 | minor number sizes. See the section on "Allocation of Device Numbers" |
| 312 | for details on maintaining compatibility with userspace. |
| 313 | |
| 314 | Solving this requires a kernel change. |
| 315 | |
| 316 | Since writing this, the kernel has been modified so that the SCSI disc |
| 317 | driver has more major numbers allocated to it and now supports up to |
| 318 | 128 discs. Since these major numbers are non-contiguous (a result of |
| 319 | unplanned expansion), the implementation is a little more cumbersome |
| 320 | than originally. |
| 321 | |
| 322 | Just like the changes to IPv4 to fix impending limitations in the |
| 323 | address space, people find ways around the limitations. In the long |
| 324 | run, however, solutions like IPv6 or devfs can't be put off forever. |
| 325 | |
| 326 | Read-only root filesystem |
| 327 | |
| 328 | Having your device nodes on the root filesystem means that you can't |
| 329 | operate properly with a read-only root filesystem. This is because you |
| 330 | want to change ownerships and protections of tty devices. Existing |
| 331 | practice prevents you using a CD-ROM as your root filesystem for a |
| 332 | *real* system. Sure, you can boot off a CD-ROM, but you can't change |
| 333 | tty ownerships, so it's only good for installing. |
| 334 | |
| 335 | Also, you can't use a shared NFS root filesystem for a cluster of |
| 336 | discless Linux machines (having tty ownerships changed on a common |
| 337 | /dev is not good). Nor can you embed your root filesystem in a |
| 338 | ROM-FS. |
| 339 | |
| 340 | You can get around this by creating a RAMDISC at boot time, making |
| 341 | an ext2 filesystem in it, mounting it somewhere and copying the |
| 342 | contents of /dev into it, then unmounting it and mounting it over |
| 343 | /dev. |
| 344 | |
| 345 | A devfs is a cleaner way of solving this. |
| 346 | |
| 347 | Non-Unix root filesystem |
| 348 | |
| 349 | Non-Unix filesystems (such as NTFS) can't be used for a root |
| 350 | filesystem because they variously don't support character and block |
| 351 | special files or symbolic links. You can't have a separate disc-based |
| 352 | or RAMDISC-based filesystem mounted on /dev because you need device |
| 353 | nodes before you can mount these. Devfs can be mounted without any |
| 354 | device nodes. Devlinks won't work because symlinks aren't supported. |
| 355 | An alternative solution is to use initrd to mount a RAMDISC initial |
| 356 | root filesystem (which is populated with a minimal set of device |
| 357 | nodes), and then construct a new /dev in another RAMDISC, and finally |
| 358 | switch to your non-Unix root filesystem. This requires clever boot |
| 359 | scripts and a fragile and conceptually complex boot procedure. |
| 360 | |
| 361 | Devfs solves this in a robust and conceptually simple way. |
| 362 | |
| 363 | PTY security |
| 364 | |
| 365 | Current pseudo-tty (pty) devices are owned by root and read-writable |
| 366 | by everyone. The user of a pty-pair cannot change |
| 367 | ownership/protections without being suid-root. |
| 368 | |
| 369 | This could be solved with a secure user-space daemon which runs as |
| 370 | root and does the actual creation of pty-pairs. Such a daemon would |
| 371 | require modification to *every* programme that wants to use this new |
| 372 | mechanism. It also slows down creation of pty-pairs. |
| 373 | |
| 374 | An alternative is to create a new open_pty() syscall which does much |
| 375 | the same thing as the user-space daemon. Once again, this requires |
| 376 | modifications to pty-handling programmes. |
| 377 | |
| 378 | The devfs solution allows a device driver to "tag" certain device |
| 379 | files so that when an unopened device is opened, the ownerships are |
| 380 | changed to the current euid and egid of the opening process, and the |
| 381 | protections are changed to the default registered by the driver. When |
| 382 | the device is closed ownership is set back to root and protections are |
| 383 | set back to read-write for everybody. No programme need be changed. |
| 384 | The devpts filesystem provides this auto-ownership feature for Unix98 |
| 385 | ptys. It doesn't support old-style pty devices, nor does it have all |
| 386 | the other features of devfs. |
| 387 | |
| 388 | Intelligent device management |
| 389 | |
| 390 | Devfs implements a simple yet powerful protocol for communication with |
| 391 | a device management daemon (devfsd) which runs in user space. It is |
| 392 | possible to send a message (either synchronously or asynchronously) to |
| 393 | devfsd on any event, such as registration/unregistration of device |
| 394 | entries, opening and closing devices, looking up inodes, scanning |
| 395 | directories and more. This has many possibilities. Some of these are |
| 396 | already implemented. See: |
| 397 | |
| 398 | |
| 399 | http://www.atnf.csiro.au/~rgooch/linux/ |
| 400 | |
| 401 | Device entry registration events can be used by devfsd to change |
| 402 | permissions of newly-created device nodes. This is one mechanism to |
| 403 | control device permissions. |
| 404 | |
| 405 | Device entry registration/unregistration events can be used to run |
| 406 | programmes or scripts. This can be used to provide automatic mounting |
| 407 | of filesystems when a new block device media is inserted into the |
| 408 | drive. |
| 409 | |
| 410 | Asynchronous device open and close events can be used to implement |
| 411 | clever permissions management. For example, the default permissions on |
| 412 | /dev/dsp do not allow everybody to read from the device. This is |
| 413 | sensible, as you don't want some remote user recording what you say at |
| 414 | your console. However, the console user is also prevented from |
| 415 | recording. This behaviour is not desirable. With asynchronous device |
| 416 | open and close events, you can have devfsd run a programme or script |
| 417 | when console devices are opened to change the ownerships for *other* |
| 418 | device nodes (such as /dev/dsp). On closure, you can run a different |
| 419 | script to restore permissions. An advantage of this scheme over |
| 420 | modifying the C library tty handling is that this works even if your |
| 421 | programme crashes (how many times have you seen the utmp database with |
| 422 | lingering entries for non-existent logins?). |
| 423 | |
| 424 | Synchronous device open events can be used to perform intelligent |
| 425 | device access protections. Before the device driver open() method is |
| 426 | called, the daemon must first validate the open attempt, by running an |
| 427 | external programme or script. This is far more flexible than access |
| 428 | control lists, as access can be determined on the basis of other |
| 429 | system conditions instead of just the UID and GID. |
| 430 | |
| 431 | Inode lookup events can be used to authenticate module autoload |
| 432 | requests. Instead of using kmod directly, the event is sent to |
| 433 | devfsd which can implement an arbitrary authentication before loading |
| 434 | the module itself. |
| 435 | |
| 436 | Inode lookup events can also be used to construct arbitrary |
| 437 | namespaces, without having to resort to populating devfs with symlinks |
| 438 | to devices that don't exist. |
| 439 | |
| 440 | Speculative Device Scanning |
| 441 | |
| 442 | Consider an application (like cdparanoia) that wants to find all |
| 443 | CD-ROM devices on the system (SCSI, IDE and other types), whether or |
| 444 | not their respective modules are loaded. The application must |
| 445 | speculatively open certain device nodes (such as /dev/sr0 for the SCSI |
| 446 | CD-ROMs) in order to make sure the module is loaded. This requires |
| 447 | that all Linux distributions follow the standard device naming scheme |
| 448 | (last time I looked RedHat did things differently). Devfs solves the |
| 449 | naming problem. |
| 450 | |
| 451 | The same application also wants to see which devices are actually |
| 452 | available on the system. With the existing system it needs to read the |
| 453 | /dev directory and speculatively open each /dev/sr* device to |
| 454 | determine if the device exists or not. With a large /dev this is an |
| 455 | inefficient operation, especially if there are many /dev/sr* nodes. A |
| 456 | solution like scsidev could reduce the number of /dev/sr* entries (but |
| 457 | of course that also requires all that inefficient directory scanning). |
| 458 | |
| 459 | With devfs, the application can open the /dev/sr directory |
| 460 | (which triggers the module autoloading if required), and proceed to |
| 461 | read /dev/sr. Since only the available devices will have |
| 462 | entries, there are no inefficencies in directory scanning or device |
| 463 | openings. |
| 464 | |
| 465 | ----------------------------------------------------------------------------- |
| 466 | |
| 467 | Who else does it? |
| 468 | |
| 469 | FreeBSD has a devfs implementation. Solaris and AIX each have a |
| 470 | pseudo-devfs (something akin to scsidev but for all devices, with some |
| 471 | unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's |
| 472 | IRIX 6.4 and above also have a device filesystem. |
| 473 | |
| 474 | While we shouldn't just automatically do something because others do |
| 475 | it, we should not ignore the work of others either. FreeBSD has a lot |
| 476 | of competent people working on it, so their opinion should not be |
| 477 | blithely ignored. |
| 478 | |
| 479 | ----------------------------------------------------------------------------- |
| 480 | |
| 481 | |
| 482 | How it works |
| 483 | |
| 484 | Registering device entries |
| 485 | |
| 486 | For every entry (device node) in a devfs-based /dev a driver must call |
| 487 | devfs_register(). This adds the name of the device entry, the |
| 488 | file_operations structure pointer and a few other things to an |
| 489 | internal table. Device entries may be added and removed at any |
| 490 | time. When a device entry is registered, it automagically appears in |
| 491 | any mounted devfs'. |
| 492 | |
| 493 | Inode lookup |
| 494 | |
| 495 | When a lookup operation on an entry is performed and if there is no |
| 496 | driver information for that entry devfs will attempt to call |
| 497 | devfsd. If still no driver information can be found then a negative |
| 498 | dentry is yielded and the next stage operation will be called by the |
| 499 | VFS (such as create() or mknod() inode methods). If driver information |
| 500 | can be found, an inode is created (if one does not exist already) and |
| 501 | all is well. |
| 502 | |
| 503 | Manually creating device nodes |
| 504 | |
| 505 | The mknod() method allows you to create an ordinary named pipe in the |
| 506 | devfs, or you can create a character or block special inode if one |
| 507 | does not already exist. You may wish to create a character or block |
| 508 | special inode so that you can set permissions and ownership. Later, if |
| 509 | a device driver registers an entry with the same name, the |
| 510 | permissions, ownership and times are retained. This is how you can set |
| 511 | the protections on a device even before the driver is loaded. Once you |
| 512 | create an inode it appears in the directory listing. |
| 513 | |
| 514 | Unregistering device entries |
| 515 | |
| 516 | A device driver calls devfs_unregister() to unregister an entry. |
| 517 | |
| 518 | Chroot() gaols |
| 519 | |
| 520 | 2.2.x kernels |
| 521 | |
| 522 | The semantics of inode creation are different when devfs is mounted |
| 523 | with the "explicit" option. Now, when a device entry is registered, it |
| 524 | will not appear until you use mknod() to create the device. It doesn't |
| 525 | matter if you mknod() before or after the device is registered with |
| 526 | devfs_register(). The purpose of this behaviour is to support |
| 527 | chroot(2) gaols, where you want to mount a minimal devfs inside the |
| 528 | gaol. Only the devices you specifically want to be available (through |
| 529 | your mknod() setup) will be accessible. |
| 530 | |
| 531 | 2.4.x kernels |
| 532 | |
| 533 | As of kernel 2.3.99, the VFS has had the ability to rebind parts of |
| 534 | the global filesystem namespace into another part of the namespace. |
| 535 | This now works even at the leaf-node level, which means that |
| 536 | individual files and device nodes may be bound into other parts of the |
| 537 | namespace. This is like making links, but better, because it works |
| 538 | across filesystems (unlike hard links) and works through chroot() |
| 539 | gaols (unlike symbolic links). |
| 540 | |
| 541 | Because of these improvements to the VFS, the multi-mount capability |
| 542 | in devfs is no longer needed. The administrator may create a minimal |
| 543 | device tree inside a chroot(2) gaol by using VFS bindings. As this |
| 544 | provides most of the features of the devfs multi-mount capability, I |
| 545 | removed the multi-mount support code (after issuing an RFC). This |
| 546 | yielded code size reductions and simplifications. |
| 547 | |
| 548 | If you want to construct a minimal chroot() gaol, the following |
| 549 | command should suffice: |
| 550 | |
| 551 | mount --bind /dev/null /gaol/dev/null |
| 552 | |
| 553 | |
| 554 | Repeat for other device nodes you want to expose. Simple! |
| 555 | |
| 556 | ----------------------------------------------------------------------------- |
| 557 | |
| 558 | |
| 559 | Operational issues |
| 560 | |
| 561 | |
| 562 | Instructions for the impatient |
| 563 | |
| 564 | Nobody likes reading documentation. People just want to get in there |
| 565 | and play. So this section tells you quickly the steps you need to take |
| 566 | to run with devfs mounted over /dev. Skip these steps and you will end |
| 567 | up with a nearly unbootable system. Subsequent sections describe the |
| 568 | issues in more detail, and discuss non-essential configuration |
| 569 | options. |
| 570 | |
| 571 | Devfsd |
| 572 | OK, if you're reading this, I assume you want to play with |
| 573 | devfs. First you should ensure that /usr/src/linux contains a |
| 574 | recent kernel source tree. Then you need to compile devfsd, the device |
| 575 | management daemon, available at |
| 576 | |
| 577 | http://www.atnf.csiro.au/~rgooch/linux/. |
| 578 | Because the kernel has a naming scheme |
| 579 | which is quite different from the old naming scheme, you need to |
| 580 | install devfsd so that software and configuration files that use the |
| 581 | old naming scheme will not break. |
| 582 | |
| 583 | Compile and install devfsd. You will be provided with a default |
| 584 | configuration file /etc/devfsd.conf which will provide |
| 585 | compatibility symlinks for the old naming scheme. Don't change this |
| 586 | config file unless you know what you're doing. Even if you think you |
| 587 | do know what you're doing, don't change it until you've followed all |
| 588 | the steps below and booted a devfs-enabled system and verified that it |
| 589 | works. |
| 590 | |
| 591 | Now edit your main system boot script so that devfsd is started at the |
| 592 | very beginning (before any filesystem |
| 593 | checks). /etc/rc.d/rc.sysinit is often the main boot script |
| 594 | on systems with SysV-style boot scripts. On systems with BSD-style |
| 595 | boot scripts it is often /etc/rc. Also check |
| 596 | /sbin/rc. |
| 597 | |
| 598 | NOTE that the line you put into the boot |
| 599 | script should be exactly: |
| 600 | |
| 601 | /sbin/devfsd /dev |
| 602 | |
| 603 | DO NOT use some special daemon-launching |
| 604 | programme, otherwise the boot script may not wait for devfsd to finish |
| 605 | initialising. |
| 606 | |
| 607 | System Libraries |
| 608 | There may still be some problems because of broken software making |
| 609 | assumptions about device names. In particular, some software does not |
| 610 | handle devices which are symbolic links. If you are running a libc 5 |
| 611 | based system, install libc 5.4.44 (if you have libc 5.4.46, go back to |
| 612 | libc 5.4.44, which is actually correct). If you are running a glibc |
| 613 | based system, make sure you have glibc 2.1.3 or later. |
| 614 | |
| 615 | /etc/securetty |
| 616 | PAM (Pluggable Authentication Modules) is supposed to be a flexible |
| 617 | mechanism for providing better user authentication and access to |
| 618 | services. Unfortunately, it's also fragile, complex and undocumented |
| 619 | (check out RedHat 6.1, and probably other distributions as well). PAM |
| 620 | has problems with symbolic links. Append the following lines to your |
| 621 | /etc/securetty file: |
| 622 | |
| 623 | vc/1 |
| 624 | vc/2 |
| 625 | vc/3 |
| 626 | vc/4 |
| 627 | vc/5 |
| 628 | vc/6 |
| 629 | vc/7 |
| 630 | vc/8 |
| 631 | |
| 632 | This will not weaken security. If you have a version of util-linux |
| 633 | earlier than 2.10.h, please upgrade to 2.10.h or later. If you |
| 634 | absolutely cannot upgrade, then also append the following lines to |
| 635 | your /etc/securetty file: |
| 636 | |
| 637 | 1 |
| 638 | 2 |
| 639 | 3 |
| 640 | 4 |
| 641 | 5 |
| 642 | 6 |
| 643 | 7 |
| 644 | 8 |
| 645 | |
| 646 | This may potentially weaken security by allowing root logins over the |
| 647 | network (a password is still required, though). However, since there |
| 648 | are problems with dealing with symlinks, I'm suspicious of the level |
| 649 | of security offered in any case. |
| 650 | |
| 651 | XFree86 |
| 652 | While not essential, it's probably a good idea to upgrade to XFree86 |
| 653 | 4.0, as patches went in to make it more devfs-friendly. If you don't, |
| 654 | you'll probably need to apply the following patch to |
| 655 | /etc/security/console.perms so that ordinary users can run |
| 656 | startx. Note that not all distributions have this file (e.g. Debian), |
| 657 | so if it's not present, don't worry about it. |
| 658 | |
| 659 | --- /etc/security/console.perms.orig Sat Apr 17 16:26:47 1999 |
| 660 | +++ /etc/security/console.perms Fri Feb 25 23:53:55 2000 |
| 661 | @@ -14,7 +14,7 @@ |
| 662 | # man 5 console.perms |
| 663 | |
| 664 | # file classes -- these are regular expressions |
| 665 | -<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] |
| 666 | +<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] |
| 667 | |
| 668 | # device classes -- these are shell-style globs |
| 669 | <floppy>=/dev/fd[0-1]* |
| 670 | |
| 671 | If the patch does not apply, then change the line: |
| 672 | |
| 673 | <console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] |
| 674 | |
| 675 | with: |
| 676 | |
| 677 | <console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] |
| 678 | |
| 679 | |
| 680 | Disable devpts |
| 681 | I've had a report of devpts mounted on /dev/pts not working |
| 682 | correctly. Since devfs will also manage /dev/pts, there is no |
| 683 | need to mount devpts as well. You should either edit your |
| 684 | /etc/fstab so devpts is not mounted, or disable devpts from |
| 685 | your kernel configuration. |
| 686 | |
| 687 | Unsupported drivers |
| 688 | Not all drivers have devfs support. If you depend on one of these |
| 689 | drivers, you will need to create a script or tarfile that you can use |
| 690 | at boot time to create device nodes as appropriate. There is a |
| 691 | section which describes this. Another |
| 692 | section lists the drivers which have |
| 693 | devfs support. |
| 694 | |
| 695 | /dev/mouse |
| 696 | |
| 697 | Many disributions configure /dev/mouse to be the mouse device |
| 698 | for XFree86 and GPM. I actually think this is a bad idea, because it |
| 699 | adds another level of indirection. When looking at a config file, if |
| 700 | you see /dev/mouse you're left wondering which mouse |
| 701 | is being referred to. Hence I recommend putting the actual mouse |
| 702 | device (for example /dev/psaux) into your |
| 703 | /etc/X11/XF86Config file (and similarly for the GPM |
| 704 | configuration file). |
| 705 | |
| 706 | Alternatively, use the same technique used for unsupported drivers |
| 707 | described above. |
| 708 | |
| 709 | The Kernel |
| 710 | Finally, you need to make sure devfs is compiled into your kernel. Set |
| 711 | CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by |
| 712 | using favourite configuration tool (i.e. make config or |
| 713 | make xconfig) and then make clean and then recompile your kernel and |
| 714 | modules. At boot, devfs will be mounted onto /dev. |
| 715 | |
| 716 | If you encounter problems booting (for example if you forgot a |
| 717 | configuration step), you can pass devfs=nomount at the kernel |
| 718 | boot command line. This will prevent the kernel from mounting devfs at |
| 719 | boot time onto /dev. |
| 720 | |
| 721 | In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting |
| 722 | devfs onto /dev is completely safe, and requires no |
| 723 | configuration changes. One exception to take note of is when |
| 724 | LABEL= directives are used in /etc/fstab. In this |
| 725 | case you will be unable to boot properly. This is because the |
| 726 | mount(8) programme uses /proc/partitions as part of |
| 727 | the volume label search process, and the device names it finds are not |
| 728 | available, because setting CONFIG_DEVFS_FS=y changes the names in |
| 729 | /proc/partitions, irrespective of whether devfs is mounted. |
| 730 | |
| 731 | Now you've finished all the steps required. You're now ready to boot |
| 732 | your shiny new kernel. Enjoy. |
| 733 | |
| 734 | Changing the configuration |
| 735 | |
| 736 | OK, you've now booted a devfs-enabled system, and everything works. |
| 737 | Now you may feel like changing the configuration (common targets are |
| 738 | /etc/fstab and /etc/devfsd.conf). Since you have a |
| 739 | system that works, if you make any changes and it doesn't work, you |
| 740 | now know that you only have to restore your configuration files to the |
| 741 | default and it will work again. |
| 742 | |
| 743 | |
| 744 | Permissions persistence across reboots |
| 745 | |
| 746 | If you don't use mknod(2) to create a device file, nor use chmod(2) or |
| 747 | chown(2) to change the ownerships/permissions, the inode ctime will |
| 748 | remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime |
| 749 | later than this has had it's ownership/permissions changed. Hence, a |
| 750 | simple script or programme may be used to tar up all changed inodes, |
| 751 | prior to shutdown. Although effective, many consider this approach a |
| 752 | kludge. |
| 753 | |
| 754 | A much better approach is to use devfsd to save and restore |
| 755 | permissions. It may be configured to record changes in permissions and |
| 756 | will save them in a database (in fact a directory tree), and restore |
| 757 | these upon boot. This is an efficient method and results in immediate |
| 758 | saving of current permissions (unlike the tar approach, which saves |
| 759 | permissions at some unspecified future time). |
| 760 | |
| 761 | The default configuration file supplied with devfsd has config entries |
| 762 | which you may uncomment to enable persistence management. |
| 763 | |
| 764 | If you decide to use the tar approach anyway, be aware that tar will |
| 765 | first unlink(2) an inode before creating a new device node. The |
| 766 | unlink(2) has the effect of breaking the connection between a devfs |
| 767 | entry and the device driver. If you use the "devfs=only" boot option, |
| 768 | you lose access to the device driver, requiring you to reload the |
| 769 | module. I consider this a bug in tar (there is no real need to |
| 770 | unlink(2) the inode first). |
| 771 | |
| 772 | Alternatively, you can use devfsd to provide more sophisticated |
| 773 | management of device permissions. You can use devfsd to store |
| 774 | permissions for whole groups of devices with a single configuration |
| 775 | entry, rather than the conventional single entry per device entry. |
| 776 | |
| 777 | Permissions database stored in mounted-over /dev |
| 778 | |
| 779 | If you wish to save and restore your device permissions into the |
| 780 | disc-based /dev while still mounting devfs onto /dev |
| 781 | you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or |
| 782 | later), which has the VFS binding facility. You need to do the |
| 783 | following to set this up: |
| 784 | |
| 785 | |
| 786 | |
| 787 | make sure the kernel does not mount devfs at boot time |
| 788 | |
| 789 | |
| 790 | make sure you have a correct /dev/console entry in your |
| 791 | root file-system (where your disc-based /dev lives) |
| 792 | |
| 793 | create the /dev-state directory |
| 794 | |
| 795 | |
| 796 | add the following lines near the very beginning of your boot |
| 797 | scripts: |
| 798 | |
| 799 | mount --bind /dev /dev-state |
| 800 | mount -t devfs none /dev |
| 801 | devfsd /dev |
| 802 | |
| 803 | |
| 804 | |
| 805 | |
| 806 | add the following lines to your /etc/devfsd.conf file: |
| 807 | |
| 808 | REGISTER ^pt[sy] IGNORE |
| 809 | CREATE ^pt[sy] IGNORE |
| 810 | CHANGE ^pt[sy] IGNORE |
| 811 | DELETE ^pt[sy] IGNORE |
| 812 | REGISTER .* COPY /dev-state/$devname $devpath |
| 813 | CREATE .* COPY $devpath /dev-state/$devname |
| 814 | CHANGE .* COPY $devpath /dev-state/$devname |
| 815 | DELETE .* CFUNCTION GLOBAL unlink /dev-state/$devname |
| 816 | RESTORE /dev-state |
| 817 | |
| 818 | Note that the sample devfsd.conf file contains these lines, |
| 819 | as well as other sample configurations you may find useful. See the |
| 820 | devfsd distribution |
| 821 | |
| 822 | |
| 823 | reboot. |
| 824 | |
| 825 | |
| 826 | |
| 827 | |
| 828 | Permissions database stored in normal directory |
| 829 | |
| 830 | If you are using an older kernel which doesn't support VFS binding, |
| 831 | then you won't be able to have the permissions database in a |
| 832 | mounted-over /dev. However, you can still use a regular |
| 833 | directory to store the database. The sample /etc/devfsd.conf |
| 834 | file above may still be used. You will need to create the |
| 835 | /dev-state directory prior to installing devfsd. If you have |
| 836 | old permissions in /dev, then just copy (or move) the device |
| 837 | nodes over to the new directory. |
| 838 | |
| 839 | Which method is better? |
| 840 | |
| 841 | The best method is to have the permissions database stored in the |
| 842 | mounted-over /dev. This is because you will not need to copy |
| 843 | device nodes over to /dev-state, and because it allows you to |
| 844 | switch between devfs and non-devfs kernels, without requiring you to |
| 845 | copy permissions between /dev-state (for devfs) and |
| 846 | /dev (for non-devfs). |
| 847 | |
| 848 | |
| 849 | Dealing with drivers without devfs support |
| 850 | |
| 851 | Currently, not all device drivers in the kernel have been modified to |
| 852 | use devfs. Device drivers which do not yet have devfs support will not |
| 853 | automagically appear in devfs. The simplest way to create device nodes |
| 854 | for these drivers is to unpack a tarfile containing the required |
| 855 | device nodes. You can do this in your boot scripts. All your drivers |
| 856 | will now work as before. |
| 857 | |
| 858 | Hopefully for most people devfs will have enough support so that they |
| 859 | can mount devfs directly over /dev without losing most functionality |
| 860 | (i.e. losing access to various devices). As of 22-JAN-1998 (devfs |
| 861 | patch version 10) I am now running this way. All the devices I have |
| 862 | are available in devfs, so I don't lose anything. |
| 863 | |
| 864 | WARNING: if your configuration requires the old-style device names |
| 865 | (i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure |
| 866 | it to maintain compatibility entries. It is almost certain that you |
| 867 | will require this. Note that the kernel creates a compatibility entry |
| 868 | for the root device, so you don't need initrd. |
| 869 | |
| 870 | Note that you no longer need to mount devpts if you use Unix98 PTYs, |
| 871 | as devfs can manage /dev/pts itself. This saves you some RAM, as you |
| 872 | don't need to compile and install devpts. Note that some versions of |
| 873 | glibc have a bug with Unix98 pty handling on devfs systems. Contact |
| 874 | the glibc maintainers for a fix. Glibc 2.1.3 has the fix. |
| 875 | |
| 876 | Note also that apart from editing /etc/fstab, other things will need |
| 877 | to be changed if you *don't* install devfsd. Some software (like the X |
| 878 | server) hard-wire device names in their source. It really is much |
| 879 | easier to install devfsd so that compatibility entries are created. |
| 880 | You can then slowly migrate your system to using the new device names |
| 881 | (for example, by starting with /etc/fstab), and then limiting the |
| 882 | compatibility entries that devfsd creates. |
| 883 | |
| 884 | IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD |
| 885 | BEFORE YOU BOOT A DEVFS-ENABLED KERNEL! |
| 886 | |
| 887 | Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of |
| 888 | reports back. Many of these are because people are trying to run |
| 889 | without devfsd, and hence some things break. Please just run devfsd if |
| 890 | things break. I want to concentrate on real bugs rather than |
| 891 | misconfiguration problems at the moment. If people are willing to fix |
| 892 | bugs/false assumptions in other code (i.e. glibc, X server) and submit |
| 893 | that to the respective maintainers, that would be great. |
| 894 | |
| 895 | |
| 896 | All the way with Devfs |
| 897 | |
| 898 | The devfs kernel patch creates a rationalised device tree. As stated |
| 899 | above, if you want to keep using the old /dev naming scheme, |
| 900 | you just need to configure devfsd appopriately (see the man |
| 901 | page). People who prefer the old names can ignore this section. For |
| 902 | those of us who like the rationalised names and an uncluttered |
| 903 | /dev, read on. |
| 904 | |
| 905 | If you don't run devfsd, or don't enable compatibility entry |
| 906 | management, then you will have to configure your system to use the new |
| 907 | names. For example, you will then need to edit your |
| 908 | /etc/fstab to use the new disc naming scheme. If you want to |
| 909 | be able to boot non-devfs kernels, you will need compatibility |
| 910 | symlinks in the underlying disc-based /dev pointing back to |
| 911 | the old-style names for when you boot a kernel without devfs. |
| 912 | |
| 913 | You can selectively decide which devices you want compatibility |
| 914 | entries for. For example, you may only want compatibility entries for |
| 915 | BSD pseudo-terminal devices (otherwise you'll have to patch you C |
| 916 | library or use Unix98 ptys instead). It's just a matter of putting in |
| 917 | the correct regular expression into /dev/devfsd.conf. |
| 918 | |
| 919 | There are other choices of naming schemes that you may prefer. For |
| 920 | example, I don't use the kernel-supplied |
| 921 | names, because they are too verbose. A common misconception is |
| 922 | that the kernel-supplied names are meant to be used directly in |
| 923 | configuration files. This is not the case. They are designed to |
| 924 | reflect the layout of the devices attached and to provide easy |
| 925 | classification. |
| 926 | |
| 927 | If you like the kernel-supplied names, that's fine. If you don't then |
| 928 | you should be using devfsd to construct a namespace more to your |
| 929 | liking. Devfsd has built-in code to construct a |
| 930 | namespace that is both logical and easy to |
| 931 | manage. In essence, it creates a convenient abbreviation of the |
| 932 | kernel-supplied namespace. |
| 933 | |
| 934 | You are of course free to build your own namespace. Devfsd has all the |
| 935 | infrastructure required to make this easy for you. All you need do is |
| 936 | write a script. You can even write some C code and devfsd can load the |
| 937 | shared object as a callable extension. |
| 938 | |
| 939 | |
| 940 | Other Issues |
| 941 | |
| 942 | The init programme |
| 943 | Another thing to take note of is whether your init programme |
| 944 | creates a Unix socket /dev/telinit. Some versions of init |
| 945 | create /dev/telinit so that the telinit programme can |
| 946 | communicate with the init process. If you have such a system you need |
| 947 | to make sure that devfs is mounted over /dev *before* init |
| 948 | starts. In other words, you can't leave the mounting of devfs to |
| 949 | /etc/rc, since this is executed after init. Other |
| 950 | versions of init require a named pipe /dev/initctl |
| 951 | which must exist *before* init starts. Once again, you need to |
| 952 | mount devfs and then create the named pipe *before* init |
| 953 | starts. |
| 954 | |
| 955 | The default behaviour now is not to mount devfs onto /dev at |
| 956 | boot time for 2.3.x and later kernels. You can correct this with the |
| 957 | "devfs=mount" boot option. This solves any problems with init, |
| 958 | and also prevents the dreaded: |
| 959 | |
| 960 | Cannot open initial console |
| 961 | |
| 962 | message. For 2.2.x kernels where you need to apply the devfs patch, |
| 963 | the default is to mount. |
| 964 | |
| 965 | If you have automatic mounting of devfs onto /dev then you |
| 966 | may need to create /dev/initctl in your boot scripts. The |
| 967 | following lines should suffice: |
| 968 | |
| 969 | mknod /dev/initctl p |
| 970 | kill -SIGUSR1 1 # tell init that /dev/initctl now exists |
| 971 | |
| 972 | Alternatively, if you don't want the kernel to mount devfs onto |
| 973 | /dev then you could use the following procedure is a |
| 974 | guideline for how to get around /dev/initctl problems: |
| 975 | |
| 976 | # cd /sbin |
| 977 | # mv init init.real |
| 978 | # cat > init |
| 979 | #! /bin/sh |
| 980 | mount -n -t devfs none /dev |
| 981 | mknod /dev/initctl p |
| 982 | exec /sbin/init.real $* |
| 983 | [control-D] |
| 984 | # chmod a+x init |
| 985 | |
| 986 | Note that newer versions of init create /dev/initctl |
| 987 | automatically, so you don't have to worry about this. |
| 988 | |
| 989 | Module autoloading |
| 990 | You will need to configure devfsd to enable module |
| 991 | autoloading. The following lines should be placed in your |
| 992 | /etc/devfsd.conf file: |
| 993 | |
| 994 | LOOKUP .* MODLOAD |
| 995 | |
| 996 | |
| 997 | As of devfsd-v1.3.10, a generic /etc/modules.devfs |
| 998 | configuration file is installed, which is used by the MODLOAD |
| 999 | action. This should be sufficient for most configurations. If you |
| 1000 | require further configuration, edit your /etc/modules.conf |
| 1001 | file. The way module autoloading work with devfs is: |
| 1002 | |
| 1003 | |
| 1004 | a process attempts to lookup a device node (e.g. /dev/fred) |
| 1005 | |
| 1006 | |
| 1007 | if that device node does not exist, the full pathname is passed to |
| 1008 | devfsd as a string |
| 1009 | |
| 1010 | |
| 1011 | devfsd will pass the string to the modprobe programme (provided the |
| 1012 | configuration line shown above is present), and specifies that |
| 1013 | /etc/modules.devfs is the configuration file |
| 1014 | |
| 1015 | |
| 1016 | /etc/modules.devfs includes /etc/modules.conf to |
| 1017 | access local configurations |
| 1018 | |
| 1019 | modprobe will search it's configuration files, looking for an alias |
| 1020 | that translates the pathname into a module name |
| 1021 | |
| 1022 | |
| 1023 | the translated pathname is then used to load the module. |
| 1024 | |
| 1025 | |
| 1026 | If you wanted a lookup of /dev/fred to load the |
| 1027 | mymod module, you would require the following configuration |
| 1028 | line in /etc/modules.conf: |
| 1029 | |
| 1030 | alias /dev/fred mymod |
| 1031 | |
| 1032 | The /etc/modules.devfs configuration file provides many such |
| 1033 | aliases for standard device names. If you look closely at this file, |
| 1034 | you will note that some modules require multiple alias configuration |
| 1035 | lines. This is required to support module autoloading for old and new |
| 1036 | device names. |
| 1037 | |
| 1038 | Mounting root off a devfs device |
| 1039 | If you wish to mount root off a devfs device when you pass the |
| 1040 | "devfs=only" boot option, then you need to pass in the |
| 1041 | "root=<device>" option to the kernel when booting. If you use |
| 1042 | LILO, then you must have this in lilo.conf: |
| 1043 | |
| 1044 | append = "root=<device>" |
| 1045 | |
| 1046 | Surprised? Yep, so was I. It turns out if you have (as most people |
| 1047 | do): |
| 1048 | |
| 1049 | root = <device> |
| 1050 | |
| 1051 | |
| 1052 | then LILO will determine the device number of <device> and will |
| 1053 | write that device number into a special place in the kernel image |
| 1054 | before starting the kernel, and the kernel will use that device number |
| 1055 | to mount the root filesystem. So, using the "append" variety ensures |
| 1056 | that LILO passes the root filesystem device as a string, which devfs |
| 1057 | can then use. |
| 1058 | |
| 1059 | Note that this isn't an issue if you don't pass "devfs=only". |
| 1060 | |
| 1061 | TTY issues |
| 1062 | The ttyname(3) function in some versions of the C library makes |
| 1063 | false assumptions about device entries which are symbolic links. The |
| 1064 | tty(1) programme is one that depends on this function. I've |
| 1065 | written a patch to libc 5.4.43 which fixes this. This has been |
| 1066 | included in libc 5.4.44 and a similar fix is in glibc 2.1.3. |
| 1067 | |
| 1068 | |
| 1069 | Kernel Naming Scheme |
| 1070 | |
| 1071 | The kernel provides a default naming scheme. This scheme is designed |
| 1072 | to make it easy to search for specific devices or device types, and to |
| 1073 | view the available devices. Some device types (such as hard discs), |
| 1074 | have a directory of entries, making it easy to see what devices of |
| 1075 | that class are available. Often, the entries are symbolic links into a |
| 1076 | directory tree that reflects the topology of available devices. The |
| 1077 | topological tree is useful for finding how your devices are arranged. |
| 1078 | |
| 1079 | Below is a list of the naming schemes for the most common drivers. A |
| 1080 | list of reserved device names is |
| 1081 | available for reference. Please send email to |
| 1082 | rgooch@atnf.csiro.au to obtain an allocation. Please be |
| 1083 | patient (the maintainer is busy). An alternative name may be allocated |
| 1084 | instead of the requested name, at the discretion of the maintainer. |
| 1085 | |
| 1086 | Disc Devices |
| 1087 | |
| 1088 | All discs, whether SCSI, IDE or whatever, are placed under the |
| 1089 | /dev/discs hierarchy: |
| 1090 | |
| 1091 | /dev/discs/disc0 first disc |
| 1092 | /dev/discs/disc1 second disc |
| 1093 | |
| 1094 | |
| 1095 | Each of these entries is a symbolic link to the directory for that |
| 1096 | device. The device directory contains: |
| 1097 | |
| 1098 | disc for the whole disc |
| 1099 | part* for individual partitions |
| 1100 | |
| 1101 | |
| 1102 | CD-ROM Devices |
| 1103 | |
| 1104 | All CD-ROMs, whether SCSI, IDE or whatever, are placed under the |
| 1105 | /dev/cdroms hierarchy: |
| 1106 | |
| 1107 | /dev/cdroms/cdrom0 first CD-ROM |
| 1108 | /dev/cdroms/cdrom1 second CD-ROM |
| 1109 | |
| 1110 | |
| 1111 | Each of these entries is a symbolic link to the real device entry for |
| 1112 | that device. |
| 1113 | |
| 1114 | Tape Devices |
| 1115 | |
| 1116 | All tapes, whether SCSI, IDE or whatever, are placed under the |
| 1117 | /dev/tapes hierarchy: |
| 1118 | |
| 1119 | /dev/tapes/tape0 first tape |
| 1120 | /dev/tapes/tape1 second tape |
| 1121 | |
| 1122 | |
| 1123 | Each of these entries is a symbolic link to the directory for that |
| 1124 | device. The device directory contains: |
| 1125 | |
| 1126 | mt for mode 0 |
| 1127 | mtl for mode 1 |
| 1128 | mtm for mode 2 |
| 1129 | mta for mode 3 |
| 1130 | mtn for mode 0, no rewind |
| 1131 | mtln for mode 1, no rewind |
| 1132 | mtmn for mode 2, no rewind |
| 1133 | mtan for mode 3, no rewind |
| 1134 | |
| 1135 | |
| 1136 | SCSI Devices |
| 1137 | |
| 1138 | To uniquely identify any SCSI device requires the following |
| 1139 | information: |
| 1140 | |
| 1141 | controller (host adapter) |
| 1142 | bus (SCSI channel) |
| 1143 | target (SCSI ID) |
| 1144 | unit (Logical Unit Number) |
| 1145 | |
| 1146 | |
| 1147 | All SCSI devices are placed under /dev/scsi (assuming devfs |
| 1148 | is mounted on /dev). Hence, a SCSI device with the following |
| 1149 | parameters: c=1,b=2,t=3,u=4 would appear as: |
| 1150 | |
| 1151 | /dev/scsi/host1/bus2/target3/lun4 device directory |
| 1152 | |
| 1153 | |
| 1154 | Inside this directory, a number of device entries may be created, |
| 1155 | depending on which SCSI device-type drivers were installed. |
| 1156 | |
| 1157 | See the section on the disc naming scheme to see what entries the SCSI |
| 1158 | disc driver creates. |
| 1159 | |
| 1160 | See the section on the tape naming scheme to see what entries the SCSI |
| 1161 | tape driver creates. |
| 1162 | |
| 1163 | The SCSI CD-ROM driver creates: |
| 1164 | |
| 1165 | cd |
| 1166 | |
| 1167 | |
| 1168 | The SCSI generic driver creates: |
| 1169 | |
| 1170 | generic |
| 1171 | |
| 1172 | |
| 1173 | IDE Devices |
| 1174 | |
| 1175 | To uniquely identify any IDE device requires the following |
| 1176 | information: |
| 1177 | |
| 1178 | controller |
| 1179 | bus (aka. primary/secondary) |
| 1180 | target (aka. master/slave) |
| 1181 | unit |
| 1182 | |
| 1183 | |
| 1184 | All IDE devices are placed under /dev/ide, and uses a similar |
| 1185 | naming scheme to the SCSI subsystem. |
| 1186 | |
| 1187 | XT Hard Discs |
| 1188 | |
| 1189 | All XT discs are placed under /dev/xd. The first XT disc has |
| 1190 | the directory /dev/xd/disc0. |
| 1191 | |
| 1192 | TTY devices |
| 1193 | |
| 1194 | The tty devices now appear as: |
| 1195 | |
| 1196 | New name Old-name Device Type |
| 1197 | -------- -------- ----------- |
| 1198 | /dev/tts/{0,1,...} /dev/ttyS{0,1,...} Serial ports |
| 1199 | /dev/cua/{0,1,...} /dev/cua{0,1,...} Call out devices |
| 1200 | /dev/vc/0 /dev/tty Current virtual console |
| 1201 | /dev/vc/{1,2,...} /dev/tty{1...63} Virtual consoles |
| 1202 | /dev/vcc/{0,1,...} /dev/vcs{1...63} Virtual consoles |
| 1203 | /dev/pty/m{0,1,...} /dev/ptyp?? PTY masters |
| 1204 | /dev/pty/s{0,1,...} /dev/ttyp?? PTY slaves |
| 1205 | |
| 1206 | |
| 1207 | RAMDISCS |
| 1208 | |
| 1209 | The RAMDISCS are placed in their own directory, and are named thus: |
| 1210 | |
| 1211 | /dev/rd/{0,1,2,...} |
| 1212 | |
| 1213 | |
| 1214 | Meta Devices |
| 1215 | |
| 1216 | The meta devices are placed in their own directory, and are named |
| 1217 | thus: |
| 1218 | |
| 1219 | /dev/md/{0,1,2,...} |
| 1220 | |
| 1221 | |
| 1222 | Floppy discs |
| 1223 | |
| 1224 | Floppy discs are placed in the /dev/floppy directory. |
| 1225 | |
| 1226 | Loop devices |
| 1227 | |
| 1228 | Loop devices are placed in the /dev/loop directory. |
| 1229 | |
| 1230 | Sound devices |
| 1231 | |
| 1232 | Sound devices are placed in the /dev/sound directory |
| 1233 | (audio, sequencer, ...). |
| 1234 | |
| 1235 | |
| 1236 | Devfsd Naming Scheme |
| 1237 | |
| 1238 | Devfsd provides a naming scheme which is a convenient abbreviation of |
| 1239 | the kernel-supplied namespace. In some |
| 1240 | cases, the kernel-supplied naming scheme is quite convenient, so |
| 1241 | devfsd does not provide another naming scheme. The convenience names |
| 1242 | that devfsd creates are in fact the same names as the original devfs |
| 1243 | kernel patch created (before Linus mandated the Big Name |
| 1244 | Change). These are referred to as "new compatibility entries". |
| 1245 | |
| 1246 | In order to configure devfsd to create these convenience names, the |
| 1247 | following lines should be placed in your /etc/devfsd.conf: |
| 1248 | |
| 1249 | REGISTER .* MKNEWCOMPAT |
| 1250 | UNREGISTER .* RMNEWCOMPAT |
| 1251 | |
| 1252 | This will cause devfsd to create (and destroy) symbolic links which |
| 1253 | point to the kernel-supplied names. |
| 1254 | |
| 1255 | SCSI Hard Discs |
| 1256 | |
| 1257 | All SCSI discs are placed under /dev/sd (assuming devfs is |
| 1258 | mounted on /dev). Hence, a SCSI disc with the following |
| 1259 | parameters: c=1,b=2,t=3,u=4 would appear as: |
| 1260 | |
| 1261 | /dev/sd/c1b2t3u4 for the whole disc |
| 1262 | /dev/sd/c1b2t3u4p5 for the 5th partition |
| 1263 | /dev/sd/c1b2t3u4p5s6 for the 6th slice in the 5th partition |
| 1264 | |
| 1265 | |
| 1266 | SCSI Tapes |
| 1267 | |
| 1268 | All SCSI tapes are placed under /dev/st. A similar naming |
| 1269 | scheme is used as for SCSI discs. A SCSI tape with the |
| 1270 | parameters:c=1,b=2,t=3,u=4 would appear as: |
| 1271 | |
| 1272 | /dev/st/c1b2t3u4m0 for mode 0 |
| 1273 | /dev/st/c1b2t3u4m1 for mode 1 |
| 1274 | /dev/st/c1b2t3u4m2 for mode 2 |
| 1275 | /dev/st/c1b2t3u4m3 for mode 3 |
| 1276 | /dev/st/c1b2t3u4m0n for mode 0, no rewind |
| 1277 | /dev/st/c1b2t3u4m1n for mode 1, no rewind |
| 1278 | /dev/st/c1b2t3u4m2n for mode 2, no rewind |
| 1279 | /dev/st/c1b2t3u4m3n for mode 3, no rewind |
| 1280 | |
| 1281 | |
| 1282 | SCSI CD-ROMs |
| 1283 | |
| 1284 | All SCSI CD-ROMs are placed under /dev/sr. A similar naming |
| 1285 | scheme is used as for SCSI discs. A SCSI CD-ROM with the |
| 1286 | parameters:c=1,b=2,t=3,u=4 would appear as: |
| 1287 | |
| 1288 | /dev/sr/c1b2t3u4 |
| 1289 | |
| 1290 | |
| 1291 | SCSI Generic Devices |
| 1292 | |
| 1293 | The generic (aka. raw) interface for all SCSI devices are placed under |
| 1294 | /dev/sg. A similar naming scheme is used as for SCSI discs. A |
| 1295 | SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear |
| 1296 | as: |
| 1297 | |
| 1298 | /dev/sg/c1b2t3u4 |
| 1299 | |
| 1300 | |
| 1301 | IDE Hard Discs |
| 1302 | |
| 1303 | All IDE discs are placed under /dev/ide/hd, using a similar |
| 1304 | convention to SCSI discs. The following mappings exist between the new |
| 1305 | and the old names: |
| 1306 | |
| 1307 | /dev/hda /dev/ide/hd/c0b0t0u0 |
| 1308 | /dev/hdb /dev/ide/hd/c0b0t1u0 |
| 1309 | /dev/hdc /dev/ide/hd/c0b1t0u0 |
| 1310 | /dev/hdd /dev/ide/hd/c0b1t1u0 |
| 1311 | |
| 1312 | |
| 1313 | IDE Tapes |
| 1314 | |
| 1315 | A similar naming scheme is used as for IDE discs. The entries will |
| 1316 | appear in the /dev/ide/mt directory. |
| 1317 | |
| 1318 | IDE CD-ROM |
| 1319 | |
| 1320 | A similar naming scheme is used as for IDE discs. The entries will |
| 1321 | appear in the /dev/ide/cd directory. |
| 1322 | |
| 1323 | IDE Floppies |
| 1324 | |
| 1325 | A similar naming scheme is used as for IDE discs. The entries will |
| 1326 | appear in the /dev/ide/fd directory. |
| 1327 | |
| 1328 | XT Hard Discs |
| 1329 | |
| 1330 | All XT discs are placed under /dev/xd. The first XT disc |
| 1331 | would appear as /dev/xd/c0t0. |
| 1332 | |
| 1333 | |
| 1334 | Old Compatibility Names |
| 1335 | |
| 1336 | The old compatibility names are the legacy device names, such as |
| 1337 | /dev/hda, /dev/sda, /dev/rtc and so on. |
| 1338 | Devfsd can be configured to create compatibility symlinks so that you |
| 1339 | may continue to use the old names in your configuration files and so |
| 1340 | that old applications will continue to function correctly. |
| 1341 | |
| 1342 | In order to configure devfsd to create these legacy names, the |
| 1343 | following lines should be placed in your /etc/devfsd.conf: |
| 1344 | |
| 1345 | REGISTER .* MKOLDCOMPAT |
| 1346 | UNREGISTER .* RMOLDCOMPAT |
| 1347 | |
| 1348 | This will cause devfsd to create (and destroy) symbolic links which |
| 1349 | point to the kernel-supplied names. |
| 1350 | |
| 1351 | |
| 1352 | ----------------------------------------------------------------------------- |
| 1353 | |
| 1354 | |
| 1355 | Device drivers currently ported |
| 1356 | |
| 1357 | - All miscellaneous character devices support devfs (this is done |
| 1358 | transparently through misc_register()) |
| 1359 | |
| 1360 | - SCSI discs and generic hard discs |
| 1361 | |
| 1362 | - Character memory devices (null, zero, full and so on) |
| 1363 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> |
| 1364 | |
| 1365 | - Loop devices (/dev/loop?) |
| 1366 | |
| 1367 | - TTY devices (console, serial ports, terminals and pseudo-terminals) |
| 1368 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> |
| 1369 | |
| 1370 | - SCSI tapes (/dev/scsi and /dev/tapes) |
| 1371 | |
| 1372 | - SCSI CD-ROMs (/dev/scsi and /dev/cdroms) |
| 1373 | |
| 1374 | - SCSI generic devices (/dev/scsi) |
| 1375 | |
| 1376 | - RAMDISCS (/dev/ram?) |
| 1377 | |
| 1378 | - Meta Devices (/dev/md*) |
| 1379 | |
| 1380 | - Floppy discs (/dev/floppy) |
| 1381 | |
| 1382 | - Parallel port printers (/dev/printers) |
| 1383 | |
| 1384 | - Sound devices (/dev/sound) |
| 1385 | Thanks to Eric Dumas <dumas@linux.eu.org> and |
| 1386 | C. Scott Ananian <cananian@alumni.princeton.edu> |
| 1387 | |
| 1388 | - Joysticks (/dev/joysticks) |
| 1389 | |
| 1390 | - Sparc keyboard (/dev/kbd) |
| 1391 | |
| 1392 | - DSP56001 digital signal processor (/dev/dsp56k) |
| 1393 | |
| 1394 | - Apple Desktop Bus (/dev/adb) |
| 1395 | |
| 1396 | - Coda network file system (/dev/cfs*) |
| 1397 | |
| 1398 | - Virtual console capture devices (/dev/vcc) |
| 1399 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> |
| 1400 | |
| 1401 | - Frame buffer devices (/dev/fb) |
| 1402 | |
| 1403 | - Video capture devices (/dev/v4l) |
| 1404 | |
| 1405 | |
| 1406 | ----------------------------------------------------------------------------- |
| 1407 | |
| 1408 | |
| 1409 | Allocation of Device Numbers |
| 1410 | |
| 1411 | Devfs allows you to write a driver which doesn't need to allocate a |
| 1412 | device number (major&minor numbers) for the internal operation of the |
| 1413 | kernel. However, there are a number of userspace programmes that use |
| 1414 | the device number as a unique handle for a device. An example is the |
| 1415 | find programme, which uses device numbers to determine whether |
| 1416 | an inode is on a different filesystem than another inode. The device |
| 1417 | number used is the one for the block device which a filesystem is |
| 1418 | using. To preserve compatibility with userspace programmes, block |
| 1419 | devices using devfs need to have unique device numbers allocated to |
| 1420 | them. Furthermore, POSIX specifies device numbers, so some kind of |
| 1421 | device number needs to be presented to userspace. |
| 1422 | |
| 1423 | The simplest option (especially when porting drivers to devfs) is to |
| 1424 | keep using the old major and minor numbers. Devfs will take whatever |
| 1425 | values are given for major&minor and pass them onto userspace. |
| 1426 | |
| 1427 | This device number is a 16 bit number, so this leaves plenty of space |
| 1428 | for large numbers of discs and partitions. This scheme can also be |
| 1429 | used for character devices, in particular the tty devices, which are |
| 1430 | currently limited to 256 pseudo-ttys (this limits the total number of |
| 1431 | simultaneous xterms and remote logins). Note that the device number |
| 1432 | is limited to the range 36864-61439 (majors 144-239), in order to |
| 1433 | avoid any possible conflicts with existing official allocations. |
| 1434 | |
| 1435 | Please note that using dynamically allocated block device numbers may |
| 1436 | break the NFS daemons (both user and kernel mode), which expect dev_t |
| 1437 | for a given device to be constant over the lifetime of remote mounts. |
| 1438 | |
| 1439 | A final note on this scheme: since it doesn't increase the size of |
| 1440 | device numbers, there are no compatibility issues with userspace. |
| 1441 | |
| 1442 | ----------------------------------------------------------------------------- |
| 1443 | |
| 1444 | |
| 1445 | Questions and Answers |
| 1446 | |
| 1447 | |
| 1448 | Making things work |
| 1449 | Alternatives to devfs |
| 1450 | What I don't like about devfs |
| 1451 | How to report bugs |
| 1452 | Strange kernel messages |
| 1453 | Compilation problems with devfsd |
| 1454 | |
| 1455 | |
| 1456 | |
| 1457 | Making things work |
| 1458 | |
| 1459 | Here are some common questions and answers. |
| 1460 | |
| 1461 | |
| 1462 | |
| 1463 | Devfsd doesn't start |
| 1464 | |
| 1465 | Make sure you have compiled and installed devfsd |
| 1466 | Make sure devfsd is being started from your boot |
| 1467 | scripts |
| 1468 | Make sure you have configured your kernel to enable devfs (see |
| 1469 | below) |
| 1470 | Make sure devfs is mounted (see below) |
| 1471 | |
| 1472 | |
| 1473 | Devfsd is not managing all my permissions |
| 1474 | |
| 1475 | Make sure you are capturing the appropriate events. For example, |
| 1476 | device entries created by the kernel generate REGISTER events, |
| 1477 | but those created by devfsd generate CREATE events. |
| 1478 | |
| 1479 | |
| 1480 | Devfsd is not capturing all REGISTER events |
| 1481 | |
| 1482 | See the previous entry: you may need to capture CREATE events. |
| 1483 | |
| 1484 | |
| 1485 | X will not start |
| 1486 | |
| 1487 | Make sure you followed the steps |
| 1488 | outlined above. |
| 1489 | |
| 1490 | |
| 1491 | Why don't my network devices appear in devfs? |
| 1492 | |
| 1493 | This is not a bug. Network devices have their own, completely separate |
| 1494 | namespace. They are accessed via socket(2) and |
| 1495 | setsockopt(2) calls, and thus require no device nodes. I have |
| 1496 | raised the possibilty of moving network devices into the device |
| 1497 | namespace, but have had no response. |
| 1498 | |
| 1499 | |
| 1500 | How can I test if I have devfs compiled into my kernel? |
| 1501 | |
| 1502 | All filesystems built-in or currently loaded are listed in |
| 1503 | /proc/filesystems. If you see a devfs entry, then |
| 1504 | you know that devfs was compiled into your kernel. If you have |
| 1505 | correctly configured and rebuilt your kernel, then devfs will be |
| 1506 | built-in. If you think you've configured it in, but |
| 1507 | /proc/filesystems doesn't show it, you've made a mistake. |
| 1508 | Common mistakes include: |
| 1509 | |
| 1510 | Using a 2.2.x kernel without applying the devfs patch (if you |
| 1511 | don't know how to patch your kernel, use 2.4.x instead, don't bother |
| 1512 | asking me how to patch) |
| 1513 | Forgetting to set CONFIG_EXPERIMENTAL=y |
| 1514 | Forgetting to set CONFIG_DEVFS_FS=y |
| 1515 | Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs |
| 1516 | to be automatically mounted at boot) |
| 1517 | Editing your .config manually, instead of using make |
| 1518 | config or make xconfig |
| 1519 | Forgetting to run make dep; make clean after changing the |
| 1520 | configuration and before compiling |
| 1521 | Forgetting to compile your kernel and modules |
| 1522 | Forgetting to install your kernel |
| 1523 | Forgetting to install your modules |
| 1524 | |
| 1525 | Please check twice that you've done all these steps before sending in |
| 1526 | a bug report. |
| 1527 | |
| 1528 | |
| 1529 | |
| 1530 | How can I test if devfs is mounted on /dev? |
| 1531 | |
| 1532 | The device filesystem will always create an entry called |
| 1533 | ".devfsd", which is used to communicate with the daemon. Even |
| 1534 | if the daemon is not running, this entry will exist. Testing for the |
| 1535 | existence of this entry is the approved method of determining if devfs |
| 1536 | is mounted or not. Note that the type of entry (i.e. regular file, |
| 1537 | character device, named pipe, etc.) may change without notice. Only |
| 1538 | the existence of the entry should be relied upon. |
| 1539 | |
| 1540 | |
| 1541 | When I start devfsd, I see the error: |
| 1542 | Error opening file: ".devfsd" No such file or directory? |
| 1543 | |
| 1544 | This means that devfs is not mounted. Make sure you have devfs mounted. |
| 1545 | |
| 1546 | |
| 1547 | How do I mount devfs? |
| 1548 | |
| 1549 | First make sure you have devfs compiled into your kernel (see |
| 1550 | above). Then you will either need to: |
| 1551 | |
| 1552 | set CONFIG_DEVFS_MOUNT=y in your kernel config |
| 1553 | pass devfs=mount to your boot loader |
| 1554 | mount devfs manually in your boot scripts with: |
| 1555 | mount -t none devfs /dev |
| 1556 | |
| 1557 | |
| 1558 | |
| 1559 | Mount by volume LABEL=<label> doesn't work with |
| 1560 | devfs |
| 1561 | |
| 1562 | Most probably you are not mounting devfs onto /dev. What |
| 1563 | happens is that if your kernel config has CONFIG_DEVFS_FS=y |
| 1564 | then the contents of /proc/partitions will have the devfs |
| 1565 | names (such as scsi/host0/bus0/target0/lun0/part1). The |
| 1566 | contents of /proc/partitions are used by mount(8) when |
| 1567 | mounting by volume label. If devfs is not mounted on /dev, |
| 1568 | then mount(8) will fail to find devices. The solution is to |
| 1569 | make sure that devfs is mounted on /dev. See above for how to |
| 1570 | do that. |
| 1571 | |
| 1572 | |
| 1573 | I have extra or incorrect entries in /dev |
| 1574 | |
| 1575 | You may have stale entries in your dev-state area. Check for a |
| 1576 | RESTORE configuration line in your devfsd configuration |
| 1577 | (typically /etc/devfsd.conf). If you have this line, check |
| 1578 | the contents of the specified directory for stale entries. Remove |
| 1579 | any entries which are incorrect, then reboot. |
| 1580 | |
| 1581 | |
| 1582 | I get "Unable to open initial console" messages at boot |
| 1583 | |
| 1584 | This usually happens when you don't have devfs automounted onto |
| 1585 | /dev at boot time, and there is no valid |
| 1586 | /dev/console entry on your root file-system. Create a valid |
| 1587 | /dev/console device node. |
| 1588 | |
| 1589 | |
| 1590 | |
| 1591 | |
| 1592 | |
| 1593 | Alternatives to devfs |
| 1594 | |
| 1595 | I've attempted to collate all the anti-devfs proposals and explain |
| 1596 | their limitations. Under construction. |
| 1597 | |
| 1598 | |
| 1599 | Why not just pass device create/remove events to a daemon? |
| 1600 | |
| 1601 | Here the suggestion is to develop an API in the kernel so that devices |
| 1602 | can register create and remove events, and a daemon listens for those |
| 1603 | events. The daemon would then populate/depopulate /dev (which |
| 1604 | resides on disc). |
| 1605 | |
| 1606 | This has several limitations: |
| 1607 | |
| 1608 | |
| 1609 | it only works for modules loaded and unloaded (or devices inserted |
| 1610 | and removed) after the kernel has finished booting. Without a database |
| 1611 | of events, there is no way the daemon could fully populate |
| 1612 | /dev |
| 1613 | |
| 1614 | |
| 1615 | if you add a database to this scheme, the question is then how to |
| 1616 | present that database to user-space. If you make it a list of strings |
| 1617 | with embedded event codes which are passed through a pipe to the |
| 1618 | daemon, then this is only of use to the daemon. I would argue that the |
| 1619 | natural way to present this data is via a filesystem (since many of |
| 1620 | the events will be of a hierarchical nature), such as devfs. |
| 1621 | Presenting the data as a filesystem makes it easy for the user to see |
| 1622 | what is available and also makes it easy to write scripts to scan the |
| 1623 | "database" |
| 1624 | |
| 1625 | |
| 1626 | the tight binding between device nodes and drivers is no longer |
| 1627 | possible (requiring the otherwise perfectly avoidable |
| 1628 | table lookups) |
| 1629 | |
| 1630 | |
| 1631 | you cannot catch inode lookup events on /dev which means |
| 1632 | that module autoloading requires device nodes to be created. This is a |
| 1633 | problem, particularly for drivers where only a few inodes are created |
| 1634 | from a potentially large set |
| 1635 | |
| 1636 | |
| 1637 | this technique can't be used when the root FS is mounted |
| 1638 | read-only |
| 1639 | |
| 1640 | |
| 1641 | |
| 1642 | |
| 1643 | Just implement a better scsidev |
| 1644 | |
| 1645 | This suggestion involves taking the scsidev programme and |
| 1646 | extending it to scan for all devices, not just SCSI devices. The |
| 1647 | scsidev programme works by scanning /proc/scsi |
| 1648 | |
| 1649 | Problems: |
| 1650 | |
| 1651 | |
| 1652 | the kernel does not currently provide a list of all devices |
| 1653 | available. Not all drivers register entries in /proc or |
| 1654 | generate kernel messages |
| 1655 | |
| 1656 | |
| 1657 | there is no uniform mechanism to register devices other than the |
| 1658 | devfs API |
| 1659 | |
| 1660 | |
| 1661 | implementing such an API is then the same as the |
| 1662 | proposal above |
| 1663 | |
| 1664 | |
| 1665 | |
| 1666 | |
| 1667 | Put /dev on a ramdisc |
| 1668 | |
| 1669 | This suggestion involves creating a ramdisc and populating it with |
| 1670 | device nodes and then mounting it over /dev. |
| 1671 | |
| 1672 | Problems: |
| 1673 | |
| 1674 | |
| 1675 | |
| 1676 | this doesn't help when mounting the root filesystem, since you |
| 1677 | still need a device node to do that |
| 1678 | |
| 1679 | |
| 1680 | if you want to use this technique for the root device node as |
| 1681 | well, you need to use initrd. This complicates the booting sequence |
| 1682 | and makes it significantly harder to administer and configure. The |
| 1683 | initrd is essentially opaque, robbing the system administrator of easy |
| 1684 | configuration |
| 1685 | |
| 1686 | |
| 1687 | insufficient information is available to correctly populate the |
| 1688 | ramdisc. So we come back to the |
| 1689 | proposal above to "solve" this |
| 1690 | |
| 1691 | |
| 1692 | a ramdisc-based solution would take more kernel memory, since the |
| 1693 | backing store would be (at best) normal VFS inodes and dentries, which |
| 1694 | take 284 bytes and 112 bytes, respectively, for each entry. Compare |
| 1695 | that to 72 bytes for devfs |
| 1696 | |
| 1697 | |
| 1698 | |
| 1699 | |
| 1700 | Do nothing: there's no problem |
| 1701 | |
| 1702 | Sometimes people can be heard to claim that the existing scheme is |
| 1703 | fine. This is what they're ignoring: |
| 1704 | |
| 1705 | |
| 1706 | device number size (8 bits each for major and minor) is a real |
| 1707 | limitation, and must be fixed somehow. Systems with large numbers of |
| 1708 | SCSI devices, for example, will continue to consume the remaining |
| 1709 | unallocated major numbers. USB will also need to push beyond the 8 bit |
| 1710 | minor limitation |
| 1711 | |
| 1712 | |
| 1713 | simply increasing the device number size is insufficient. Apart |
| 1714 | from causing a lot of pain, it doesn't solve the management issues |
| 1715 | of a /dev with thousands or more device nodes |
| 1716 | |
| 1717 | |
| 1718 | ignoring the problem of a huge /dev will not make it go |
| 1719 | away, and dismisses the legitimacy of a large number of people who |
| 1720 | want a dynamic /dev |
| 1721 | |
| 1722 | |
| 1723 | the standard response then becomes: "write a device management |
| 1724 | daemon", which brings us back to the |
| 1725 | proposal above |
| 1726 | |
| 1727 | |
| 1728 | |
| 1729 | |
| 1730 | What I don't like about devfs |
| 1731 | |
| 1732 | Here are some common complaints about devfs, and some suggestions and |
| 1733 | solutions that may make it more palatable for you. I can't please |
| 1734 | everybody, but I do try :-) |
| 1735 | |
| 1736 | I hate the naming scheme |
| 1737 | |
| 1738 | First, remember that no naming scheme will please everybody. You hate |
| 1739 | the scheme, others love it. Who's to say who's right and who's wrong? |
| 1740 | Ultimately, the person who writes the code gets to choose, and what |
| 1741 | exists now is a combination of the choices made by the |
| 1742 | devfs author and the |
| 1743 | kernel maintainer (Linus). |
| 1744 | |
| 1745 | However, not all is lost. If you want to create your own naming |
| 1746 | scheme, it is a simple matter to write a standalone script, hack |
| 1747 | devfsd, or write a script called by devfsd. You can create whatever |
| 1748 | naming scheme you like. |
| 1749 | |
| 1750 | Further, if you want to remove all traces of the devfs naming scheme |
| 1751 | from /dev, you can mount devfs elsewhere (say |
| 1752 | /devfs) and populate /dev with links into |
| 1753 | /devfs. This population can be automated using devfsd if you |
| 1754 | wish. |
| 1755 | |
| 1756 | You can even use the VFS binding facility to make the links, rather |
| 1757 | than using symbolic links. This way, you don't even have to see the |
| 1758 | "destination" of these symbolic links. |
| 1759 | |
| 1760 | Devfs puts policy into the kernel |
| 1761 | |
| 1762 | There's already policy in the kernel. Device numbers are in fact |
| 1763 | policy (why should the kernel dictate what device numbers I use?). |
| 1764 | Face it, some policy has to be in the kernel. The real difference |
| 1765 | between device names as policy and device numbers as policy is that |
| 1766 | no one will use device numbers directly, because device |
| 1767 | numbers are devoid of meaning to humans and are ugly. At least with |
| 1768 | the devfs device names, (even though you can add your own naming |
| 1769 | scheme) some people will use the devfs-supplied names directly. This |
| 1770 | offends some people :-) |
| 1771 | |
| 1772 | Devfs is bloatware |
| 1773 | |
| 1774 | This is not even remotely true. As shown above, |
| 1775 | both code and data size are quite modest. |
| 1776 | |
| 1777 | |
| 1778 | How to report bugs |
| 1779 | |
| 1780 | If you have (or think you have) a bug with devfs, please follow the |
| 1781 | steps below: |
| 1782 | |
| 1783 | |
| 1784 | |
| 1785 | make sure you have enabled debugging output when configuring your |
| 1786 | kernel. You will need to set (at least) the following config options: |
| 1787 | |
| 1788 | CONFIG_DEVFS_DEBUG=y |
| 1789 | CONFIG_DEBUG_KERNEL=y |
| 1790 | CONFIG_DEBUG_SLAB=y |
| 1791 | |
| 1792 | |
| 1793 | |
| 1794 | please make sure you have the latest devfs patches applied. The |
| 1795 | latest kernel version might not have the latest devfs patches applied |
| 1796 | yet (Linus is very busy) |
| 1797 | |
| 1798 | |
| 1799 | save a copy of your complete kernel logs (preferably by |
| 1800 | using the dmesg programme) for later inclusion in your bug |
| 1801 | report. You may need to use the -s switch to increase the |
| 1802 | internal buffer size so you can capture all the boot messages. |
| 1803 | Don't edit or trim the dmesg output |
| 1804 | |
| 1805 | |
| 1806 | |
| 1807 | |
| 1808 | try booting with devfs=dall passed to the kernel boot |
| 1809 | command line (read the documentation on your bootloader on how to do |
| 1810 | this), and save the result to a file. This may be quite verbose, and |
| 1811 | it may overflow the messages buffer, but try to get as much of it as |
| 1812 | you can |
| 1813 | |
| 1814 | |
| 1815 | if you get an Oops, run ksymoops to decode it so that the |
| 1816 | names of the offending functions are provided. A non-decoded Oops is |
| 1817 | pretty useless |
| 1818 | |
| 1819 | |
| 1820 | send a copy of your devfsd configuration file(s) |
| 1821 | |
| 1822 | send the bug report to me first. |
| 1823 | Don't expect that I will see it if you post it to the linux-kernel |
| 1824 | mailing list. Include all the information listed above, plus |
| 1825 | anything else that you think might be relevant. Put the string |
| 1826 | devfs somewhere in the subject line, so my mail filters mark |
| 1827 | it as urgent |
| 1828 | |
| 1829 | |
| 1830 | |
| 1831 | |
| 1832 | Here is a general guide on how to ask questions in a way that greatly |
| 1833 | improves your chances of getting a reply: |
| 1834 | |
| 1835 | http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have |
| 1836 | a bug to report, you should also read |
| 1837 | |
| 1838 | http://www.chiark.greenend.org.uk/~sgtatham/bugs.html. |
| 1839 | |
| 1840 | |
| 1841 | Strange kernel messages |
| 1842 | |
| 1843 | You may see devfs-related messages in your kernel logs. Below are some |
| 1844 | messages and what they mean (and what you should do about them, if |
| 1845 | anything). |
| 1846 | |
| 1847 | |
| 1848 | |
| 1849 | devfs_register(fred): could not append to parent, err: -17 |
| 1850 | |
| 1851 | You need to check what the error code means, but usually 17 means |
| 1852 | EEXIST. This means that a driver attempted to create an entry |
| 1853 | fred in a directory, but there already was an entry with that |
| 1854 | name. This is often caused by flawed boot scripts which untar a bunch |
| 1855 | of inodes into /dev, as a way to restore permissions. This |
| 1856 | message is harmless, as the device nodes will still |
| 1857 | provide access to the driver (unless you use the devfs=only |
| 1858 | boot option, which is only for dedicated souls:-). If you want to get |
| 1859 | rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the |
| 1860 | recommended RESTORE directive to restore permissions. |
| 1861 | |
| 1862 | |
| 1863 | devfs_mk_dir(bill): using old entry in dir: c1808724 "" |
| 1864 | |
| 1865 | This is similar to the message above, except that a driver attempted |
| 1866 | to create a directory named bill, and the parent directory |
| 1867 | has an entry with the same name. In this case, to ensure that drivers |
| 1868 | continue to work properly, the old entry is re-used and given to the |
| 1869 | driver. In 2.5 kernels, the driver is given a NULL entry, and thus, |
| 1870 | under rare circumstances, may not create the require device nodes. |
| 1871 | The solution is the same as above. |
| 1872 | |
| 1873 | |
| 1874 | |
| 1875 | |
| 1876 | |
| 1877 | Compilation problems with devfsd |
| 1878 | |
| 1879 | Usually, you can compile devfsd just by typing in |
| 1880 | make in the source directory, followed by a make |
| 1881 | install (as root). Sometimes, you may have problems, particularly |
| 1882 | on broken configurations. |
| 1883 | |
| 1884 | |
| 1885 | |
| 1886 | error messages relating to DEVFSD_NOTIFY_DELETE |
| 1887 | |
| 1888 | This happened because you have an ancient set of kernel headers |
| 1889 | installed in /usr/include/linux or /usr/src/linux. |
| 1890 | Install kernel 2.4.10 or later. You may need to pass the |
| 1891 | KERNEL_DIR variable to make (if you did not install |
| 1892 | the new kernel sources as /usr/src/linux), or you may copy |
| 1893 | the devfs_fs.h file in the kernel source tree into |
| 1894 | /usr/include/linux. |
| 1895 | |
| 1896 | |
| 1897 | |
| 1898 | |
| 1899 | ----------------------------------------------------------------------------- |
| 1900 | |
| 1901 | |
| 1902 | Other resources |
| 1903 | |
| 1904 | |
| 1905 | |
| 1906 | Douglas Gilbert has written a useful document at |
| 1907 | |
| 1908 | http://www.torque.net/sg/devfs_scsi.html which |
| 1909 | explores the SCSI subsystem and how it interacts with devfs |
| 1910 | |
| 1911 | |
| 1912 | Douglas Gilbert has written another useful document at |
| 1913 | |
| 1914 | http://www.torque.net/scsi/SCSI-2.4-HOWTO/ which |
| 1915 | discusses the Linux SCSI subsystem in 2.4. |
| 1916 | |
| 1917 | |
| 1918 | Johannes Erdfelt has started a discussion paper on Linux and |
| 1919 | hot-swap devices, describing what the requirements are for a scalable |
| 1920 | solution and how and why he's used devfs+devfsd. Note that this is an |
| 1921 | early draft only, available in plain text form at: |
| 1922 | |
| 1923 | http://johannes.erdfelt.com/hotswap.txt. |
| 1924 | Johannes has promised a HTML version will follow. |
| 1925 | |
| 1926 | |
| 1927 | I presented an invited |
| 1928 | paper |
| 1929 | at the |
| 1930 | |
| 1931 | 2nd Annual Storage Management Workshop held in Miamia, Florida, |
| 1932 | U.S.A. in October 2000. |
| 1933 | |
| 1934 | |
| 1935 | |
| 1936 | |
| 1937 | ----------------------------------------------------------------------------- |
| 1938 | |
| 1939 | |
| 1940 | Translations of this document |
| 1941 | |
| 1942 | This document has been translated into other languages. |
| 1943 | |
| 1944 | |
| 1945 | |
| 1946 | |
| 1947 | The document master (in English) by rgooch@atnf.csiro.au is |
| 1948 | available at |
| 1949 | |
| 1950 | http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html |
| 1951 | |
| 1952 | |
| 1953 | |
| 1954 | A Korean translation by viatoris@nownuri.net is available at |
| 1955 | |
| 1956 | http://your.destiny.pe.kr/devfs/devfs.html |
| 1957 | |
| 1958 | |
| 1959 | |
| 1960 | |
| 1961 | ----------------------------------------------------------------------------- |
| 1962 | Most flags courtesy of ITA's |
| 1963 | Flags of All Countries |
| 1964 | used with permission. |