| Peter P Waskiewicz Jr | a093bf0 | 2007-06-28 20:45:47 -0700 | [diff] [blame] | 1 |  | 
 | 2 | 		HOWTO for multiqueue network device support | 
 | 3 | 		=========================================== | 
 | 4 |  | 
 | 5 | Section 1: Base driver requirements for implementing multiqueue support | 
 | 6 | Section 2: Qdisc support for multiqueue devices | 
 | 7 | Section 3: Brief howto using PRIO or RR for multiqueue devices | 
 | 8 |  | 
 | 9 |  | 
 | 10 | Intro: Kernel support for multiqueue devices | 
 | 11 | --------------------------------------------------------- | 
 | 12 |  | 
 | 13 | Kernel support for multiqueue devices is only an API that is presented to the | 
 | 14 | netdevice layer for base drivers to implement.  This feature is part of the | 
 | 15 | core networking stack, and all network devices will be running on the | 
 | 16 | multiqueue-aware stack.  If a base driver only has one queue, then these | 
 | 17 | changes are transparent to that driver. | 
 | 18 |  | 
 | 19 |  | 
 | 20 | Section 1: Base driver requirements for implementing multiqueue support | 
 | 21 | ----------------------------------------------------------------------- | 
 | 22 |  | 
 | 23 | Base drivers are required to use the new alloc_etherdev_mq() or | 
 | 24 | alloc_netdev_mq() functions to allocate the subqueues for the device.  The | 
 | 25 | underlying kernel API will take care of the allocation and deallocation of | 
 | 26 | the subqueue memory, as well as netdev configuration of where the queues | 
 | 27 | exist in memory. | 
 | 28 |  | 
 | 29 | The base driver will also need to manage the queues as it does the global | 
 | 30 | netdev->queue_lock today.  Therefore base drivers should use the | 
 | 31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the | 
 | 32 | device is still operational.  netdev->queue_lock is still used when the device | 
 | 33 | comes online or when it's completely shut down (unregister_netdev(), etc.). | 
 | 34 |  | 
 | 35 | Finally, the base driver should indicate that it is a multiqueue device.  The | 
 | 36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | 
 | 37 | bitmap on device initialization.  Below is an example from e1000: | 
 | 38 |  | 
 | 39 | #ifdef CONFIG_E1000_MQ | 
 | 40 | 	if ( (adapter->hw.mac.type == e1000_82571) || | 
 | 41 | 	     (adapter->hw.mac.type == e1000_82572) || | 
 | 42 | 	     (adapter->hw.mac.type == e1000_80003es2lan)) | 
 | 43 | 		netdev->features |= NETIF_F_MULTI_QUEUE; | 
 | 44 | #endif | 
 | 45 |  | 
 | 46 |  | 
 | 47 | Section 2: Qdisc support for multiqueue devices | 
 | 48 | ----------------------------------------------- | 
 | 49 |  | 
 | 50 | Currently two qdiscs support multiqueue devices.  A new round-robin qdisc, | 
 | 51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | 
 | 52 | bands and queues, and will store the queue mapping into skb->queue_mapping. | 
 | 53 | Use this field in the base driver to determine which queue to send the skb | 
 | 54 | to. | 
 | 55 |  | 
 | 56 | sch_rr has been added for hardware that doesn't want scheduling policies from | 
 | 57 | software, so it's a straight round-robin qdisc.  It uses the same syntax and | 
 | 58 | classification priomap that sch_prio uses, so it should be intuitive to | 
 | 59 | configure for people who've used sch_prio. | 
 | 60 |  | 
 | 61 | The PRIO qdisc naturally plugs into a multiqueue device.  If PRIO has been | 
 | 62 | built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of | 
 | 63 | bands requested is equal to the number of queues on the hardware.  If they | 
 | 64 | are equal, it sets a one-to-one mapping up between the queues and bands.  If | 
 | 65 | they're not equal, it will not load the qdisc.  This is the same behavior | 
 | 66 | for RR.  Once the association is made, any skb that is classified will have | 
 | 67 | skb->queue_mapping set, which will allow the driver to properly queue skb's | 
 | 68 | to multiple queues. | 
 | 69 |  | 
 | 70 |  | 
 | 71 | Section 3: Brief howto using PRIO and RR for multiqueue devices | 
 | 72 | --------------------------------------------------------------- | 
 | 73 |  | 
 | 74 | The userspace command 'tc,' part of the iproute2 package, is used to configure | 
 | 75 | qdiscs.  To add the PRIO qdisc to your network device, assuming the device is | 
 | 76 | called eth0, run the following command: | 
 | 77 |  | 
 | 78 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | 
 | 79 |  | 
 | 80 | This will create 4 bands, 0 being highest priority, and associate those bands | 
 | 81 | to the queues on your NIC.  Assuming eth0 has 4 Tx queues, the band mapping | 
 | 82 | would look like: | 
 | 83 |  | 
 | 84 | band 0 => queue 0 | 
 | 85 | band 1 => queue 1 | 
 | 86 | band 2 => queue 2 | 
 | 87 | band 3 => queue 3 | 
 | 88 |  | 
 | 89 | Traffic will begin flowing through each queue if your TOS values are assigning | 
 | 90 | traffic across the various bands.  For example, ssh traffic will always try to | 
 | 91 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | 
 | 92 | so it will be sent out queue 0.  ICMP traffic (pings) fall into the "normal" | 
 | 93 | traffic classification, which is band 1.  Therefore pings will be send out | 
 | 94 | queue 1 on the NIC. | 
 | 95 |  | 
 | 96 | Note the use of the multiqueue keyword.  This is only in versions of iproute2 | 
 | 97 | that support multiqueue networking devices; if this is omitted when loading | 
 | 98 | a qdisc onto a multiqueue device, the qdisc will load and operate the same | 
 | 99 | if it were loaded onto a single-queue device (i.e. - sends all traffic to | 
 | 100 | queue 0). | 
 | 101 |  | 
 | 102 | Another alternative to multiqueue band allocation can be done by using the | 
 | 103 | multiqueue option and specify 0 bands.  If this is the case, the qdisc will | 
 | 104 | allocate the number of bands to equal the number of queues that the device | 
 | 105 | reports, and bring the qdisc online. | 
 | 106 |  | 
 | 107 | The behavior of tc filters remains the same, where it will override TOS priority | 
 | 108 | classification. | 
 | 109 |  | 
 | 110 |  | 
 | 111 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> |