| Peter P Waskiewicz Jr | a093bf0 | 2007-06-28 20:45:47 -0700 | [diff] [blame] | 1 |  | 
|  | 2 | HOWTO for multiqueue network device support | 
|  | 3 | =========================================== | 
|  | 4 |  | 
|  | 5 | Section 1: Base driver requirements for implementing multiqueue support | 
|  | 6 | Section 2: Qdisc support for multiqueue devices | 
|  | 7 | Section 3: Brief howto using PRIO or RR for multiqueue devices | 
|  | 8 |  | 
|  | 9 |  | 
|  | 10 | Intro: Kernel support for multiqueue devices | 
|  | 11 | --------------------------------------------------------- | 
|  | 12 |  | 
|  | 13 | Kernel support for multiqueue devices is only an API that is presented to the | 
|  | 14 | netdevice layer for base drivers to implement.  This feature is part of the | 
|  | 15 | core networking stack, and all network devices will be running on the | 
|  | 16 | multiqueue-aware stack.  If a base driver only has one queue, then these | 
|  | 17 | changes are transparent to that driver. | 
|  | 18 |  | 
|  | 19 |  | 
|  | 20 | Section 1: Base driver requirements for implementing multiqueue support | 
|  | 21 | ----------------------------------------------------------------------- | 
|  | 22 |  | 
|  | 23 | Base drivers are required to use the new alloc_etherdev_mq() or | 
|  | 24 | alloc_netdev_mq() functions to allocate the subqueues for the device.  The | 
|  | 25 | underlying kernel API will take care of the allocation and deallocation of | 
|  | 26 | the subqueue memory, as well as netdev configuration of where the queues | 
|  | 27 | exist in memory. | 
|  | 28 |  | 
|  | 29 | The base driver will also need to manage the queues as it does the global | 
|  | 30 | netdev->queue_lock today.  Therefore base drivers should use the | 
|  | 31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the | 
|  | 32 | device is still operational.  netdev->queue_lock is still used when the device | 
|  | 33 | comes online or when it's completely shut down (unregister_netdev(), etc.). | 
|  | 34 |  | 
|  | 35 | Finally, the base driver should indicate that it is a multiqueue device.  The | 
|  | 36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | 
|  | 37 | bitmap on device initialization.  Below is an example from e1000: | 
|  | 38 |  | 
|  | 39 | #ifdef CONFIG_E1000_MQ | 
|  | 40 | if ( (adapter->hw.mac.type == e1000_82571) || | 
|  | 41 | (adapter->hw.mac.type == e1000_82572) || | 
|  | 42 | (adapter->hw.mac.type == e1000_80003es2lan)) | 
|  | 43 | netdev->features |= NETIF_F_MULTI_QUEUE; | 
|  | 44 | #endif | 
|  | 45 |  | 
|  | 46 |  | 
|  | 47 | Section 2: Qdisc support for multiqueue devices | 
|  | 48 | ----------------------------------------------- | 
|  | 49 |  | 
|  | 50 | Currently two qdiscs support multiqueue devices.  A new round-robin qdisc, | 
|  | 51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | 
|  | 52 | bands and queues, and will store the queue mapping into skb->queue_mapping. | 
|  | 53 | Use this field in the base driver to determine which queue to send the skb | 
|  | 54 | to. | 
|  | 55 |  | 
|  | 56 | sch_rr has been added for hardware that doesn't want scheduling policies from | 
|  | 57 | software, so it's a straight round-robin qdisc.  It uses the same syntax and | 
|  | 58 | classification priomap that sch_prio uses, so it should be intuitive to | 
|  | 59 | configure for people who've used sch_prio. | 
|  | 60 |  | 
| Peter P Waskiewicz Jr | fdd8a53 | 2007-09-11 11:12:06 +0200 | [diff] [blame] | 61 | In order to utilitize the multiqueue features of the qdiscs, the network | 
|  | 62 | device layer needs to enable multiple queue support.  This can be done by | 
|  | 63 | selecting NETDEVICES_MULTIQUEUE under Drivers. | 
|  | 64 |  | 
|  | 65 | The PRIO qdisc naturally plugs into a multiqueue device.  If | 
|  | 66 | NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of | 
|  | 67 | bands requested is compared to the number of queues on the hardware.  If they | 
| Peter P Waskiewicz Jr | a093bf0 | 2007-06-28 20:45:47 -0700 | [diff] [blame] | 68 | are equal, it sets a one-to-one mapping up between the queues and bands.  If | 
|  | 69 | they're not equal, it will not load the qdisc.  This is the same behavior | 
|  | 70 | for RR.  Once the association is made, any skb that is classified will have | 
|  | 71 | skb->queue_mapping set, which will allow the driver to properly queue skb's | 
|  | 72 | to multiple queues. | 
|  | 73 |  | 
|  | 74 |  | 
|  | 75 | Section 3: Brief howto using PRIO and RR for multiqueue devices | 
|  | 76 | --------------------------------------------------------------- | 
|  | 77 |  | 
|  | 78 | The userspace command 'tc,' part of the iproute2 package, is used to configure | 
|  | 79 | qdiscs.  To add the PRIO qdisc to your network device, assuming the device is | 
|  | 80 | called eth0, run the following command: | 
|  | 81 |  | 
|  | 82 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | 
|  | 83 |  | 
|  | 84 | This will create 4 bands, 0 being highest priority, and associate those bands | 
|  | 85 | to the queues on your NIC.  Assuming eth0 has 4 Tx queues, the band mapping | 
|  | 86 | would look like: | 
|  | 87 |  | 
|  | 88 | band 0 => queue 0 | 
|  | 89 | band 1 => queue 1 | 
|  | 90 | band 2 => queue 2 | 
|  | 91 | band 3 => queue 3 | 
|  | 92 |  | 
|  | 93 | Traffic will begin flowing through each queue if your TOS values are assigning | 
|  | 94 | traffic across the various bands.  For example, ssh traffic will always try to | 
|  | 95 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | 
|  | 96 | so it will be sent out queue 0.  ICMP traffic (pings) fall into the "normal" | 
|  | 97 | traffic classification, which is band 1.  Therefore pings will be send out | 
|  | 98 | queue 1 on the NIC. | 
|  | 99 |  | 
|  | 100 | Note the use of the multiqueue keyword.  This is only in versions of iproute2 | 
|  | 101 | that support multiqueue networking devices; if this is omitted when loading | 
|  | 102 | a qdisc onto a multiqueue device, the qdisc will load and operate the same | 
|  | 103 | if it were loaded onto a single-queue device (i.e. - sends all traffic to | 
|  | 104 | queue 0). | 
|  | 105 |  | 
|  | 106 | Another alternative to multiqueue band allocation can be done by using the | 
|  | 107 | multiqueue option and specify 0 bands.  If this is the case, the qdisc will | 
|  | 108 | allocate the number of bands to equal the number of queues that the device | 
|  | 109 | reports, and bring the qdisc online. | 
|  | 110 |  | 
|  | 111 | The behavior of tc filters remains the same, where it will override TOS priority | 
|  | 112 | classification. | 
|  | 113 |  | 
|  | 114 |  | 
|  | 115 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> |