| Rob Landley | c742b53 | 2007-02-10 01:46:20 -0800 | [diff] [blame] | 1 | Red-black Trees (rbtree) in Linux | 
 | 2 | January 18, 2007 | 
 | 3 | Rob Landley <rob@landley.net> | 
 | 4 | ============================= | 
 | 5 |  | 
 | 6 | What are red-black trees, and what are they for? | 
 | 7 | ------------------------------------------------ | 
 | 8 |  | 
 | 9 | Red-black trees are a type of self-balancing binary search tree, used for | 
 | 10 | storing sortable key/value data pairs.  This differs from radix trees (which | 
 | 11 | are used to efficiently store sparse arrays and thus use long integer indexes | 
 | 12 | to insert/access/delete nodes) and hash tables (which are not kept sorted to | 
 | 13 | be easily traversed in order, and must be tuned for a specific size and | 
 | 14 | hash function where rbtrees scale gracefully storing arbitrary keys). | 
 | 15 |  | 
 | 16 | Red-black trees are similar to AVL trees, but provide faster real-time bounded | 
 | 17 | worst case performance for insertion and deletion (at most two rotations and | 
 | 18 | three rotations, respectively, to balance the tree), with slightly slower | 
 | 19 | (but still O(log n)) lookup time. | 
 | 20 |  | 
 | 21 | To quote Linux Weekly News: | 
 | 22 |  | 
 | 23 |     There are a number of red-black trees in use in the kernel. | 
| Randy Dunlap | 17a9e7b | 2010-11-11 12:09:59 +0100 | [diff] [blame] | 24 |     The deadline and CFQ I/O schedulers employ rbtrees to | 
 | 25 |     track requests; the packet CD/DVD driver does the same. | 
| Rob Landley | c742b53 | 2007-02-10 01:46:20 -0800 | [diff] [blame] | 26 |     The high-resolution timer code uses an rbtree to organize outstanding | 
 | 27 |     timer requests.  The ext3 filesystem tracks directory entries in a | 
 | 28 |     red-black tree.  Virtual memory areas (VMAs) are tracked with red-black | 
 | 29 |     trees, as are epoll file descriptors, cryptographic keys, and network | 
 | 30 |     packets in the "hierarchical token bucket" scheduler. | 
 | 31 |  | 
 | 32 | This document covers use of the Linux rbtree implementation.  For more | 
 | 33 | information on the nature and implementation of Red Black Trees,  see: | 
 | 34 |  | 
 | 35 |   Linux Weekly News article on red-black trees | 
 | 36 |     http://lwn.net/Articles/184495/ | 
 | 37 |  | 
 | 38 |   Wikipedia entry on red-black trees | 
 | 39 |     http://en.wikipedia.org/wiki/Red-black_tree | 
 | 40 |  | 
 | 41 | Linux implementation of red-black trees | 
 | 42 | --------------------------------------- | 
 | 43 |  | 
 | 44 | Linux's rbtree implementation lives in the file "lib/rbtree.c".  To use it, | 
 | 45 | "#include <linux/rbtree.h>". | 
 | 46 |  | 
 | 47 | The Linux rbtree implementation is optimized for speed, and thus has one | 
 | 48 | less layer of indirection (and better cache locality) than more traditional | 
 | 49 | tree implementations.  Instead of using pointers to separate rb_node and data | 
 | 50 | structures, each instance of struct rb_node is embedded in the data structure | 
 | 51 | it organizes.  And instead of using a comparison callback function pointer, | 
 | 52 | users are expected to write their own tree search and insert functions | 
 | 53 | which call the provided rbtree functions.  Locking is also left up to the | 
 | 54 | user of the rbtree code. | 
 | 55 |  | 
 | 56 | Creating a new rbtree | 
 | 57 | --------------------- | 
 | 58 |  | 
 | 59 | Data nodes in an rbtree tree are structures containing a struct rb_node member: | 
 | 60 |  | 
 | 61 |   struct mytype { | 
 | 62 |   	struct rb_node node; | 
 | 63 |   	char *keystring; | 
 | 64 |   }; | 
 | 65 |  | 
 | 66 | When dealing with a pointer to the embedded struct rb_node, the containing data | 
 | 67 | structure may be accessed with the standard container_of() macro.  In addition, | 
 | 68 | individual members may be accessed directly via rb_entry(node, type, member). | 
 | 69 |  | 
 | 70 | At the root of each rbtree is an rb_root structure, which is initialized to be | 
 | 71 | empty via: | 
 | 72 |  | 
 | 73 |   struct rb_root mytree = RB_ROOT; | 
 | 74 |  | 
 | 75 | Searching for a value in an rbtree | 
 | 76 | ---------------------------------- | 
 | 77 |  | 
 | 78 | Writing a search function for your tree is fairly straightforward: start at the | 
 | 79 | root, compare each value, and follow the left or right branch as necessary. | 
 | 80 |  | 
 | 81 | Example: | 
 | 82 |  | 
 | 83 |   struct mytype *my_search(struct rb_root *root, char *string) | 
 | 84 |   { | 
 | 85 |   	struct rb_node *node = root->rb_node; | 
 | 86 |  | 
 | 87 |   	while (node) { | 
 | 88 |   		struct mytype *data = container_of(node, struct mytype, node); | 
 | 89 | 		int result; | 
 | 90 |  | 
 | 91 | 		result = strcmp(string, data->keystring); | 
 | 92 |  | 
 | 93 | 		if (result < 0) | 
 | 94 |   			node = node->rb_left; | 
 | 95 | 		else if (result > 0) | 
 | 96 |   			node = node->rb_right; | 
 | 97 | 		else | 
 | 98 |   			return data; | 
 | 99 | 	} | 
 | 100 | 	return NULL; | 
 | 101 |   } | 
 | 102 |  | 
 | 103 | Inserting data into an rbtree | 
 | 104 | ----------------------------- | 
 | 105 |  | 
 | 106 | Inserting data in the tree involves first searching for the place to insert the | 
 | 107 | new node, then inserting the node and rebalancing ("recoloring") the tree. | 
 | 108 |  | 
 | 109 | The search for insertion differs from the previous search by finding the | 
 | 110 | location of the pointer on which to graft the new node.  The new node also | 
 | 111 | needs a link to its parent node for rebalancing purposes. | 
 | 112 |  | 
 | 113 | Example: | 
 | 114 |  | 
 | 115 |   int my_insert(struct rb_root *root, struct mytype *data) | 
 | 116 |   { | 
 | 117 |   	struct rb_node **new = &(root->rb_node), *parent = NULL; | 
 | 118 |  | 
 | 119 |   	/* Figure out where to put new node */ | 
 | 120 |   	while (*new) { | 
 | 121 |   		struct mytype *this = container_of(*new, struct mytype, node); | 
 | 122 |   		int result = strcmp(data->keystring, this->keystring); | 
 | 123 |  | 
 | 124 | 		parent = *new; | 
 | 125 |   		if (result < 0) | 
 | 126 |   			new = &((*new)->rb_left); | 
 | 127 |   		else if (result > 0) | 
 | 128 |   			new = &((*new)->rb_right); | 
 | 129 |   		else | 
 | 130 |   			return FALSE; | 
 | 131 |   	} | 
 | 132 |  | 
 | 133 |   	/* Add new node and rebalance tree. */ | 
| figo.zhang | 27af1da | 2009-04-17 10:58:48 +0800 | [diff] [blame] | 134 |   	rb_link_node(&data->node, parent, new); | 
 | 135 |   	rb_insert_color(&data->node, root); | 
| Rob Landley | c742b53 | 2007-02-10 01:46:20 -0800 | [diff] [blame] | 136 |  | 
 | 137 | 	return TRUE; | 
 | 138 |   } | 
 | 139 |  | 
 | 140 | Removing or replacing existing data in an rbtree | 
 | 141 | ------------------------------------------------ | 
 | 142 |  | 
 | 143 | To remove an existing node from a tree, call: | 
 | 144 |  | 
 | 145 |   void rb_erase(struct rb_node *victim, struct rb_root *tree); | 
 | 146 |  | 
 | 147 | Example: | 
 | 148 |  | 
| figo.zhang | 27af1da | 2009-04-17 10:58:48 +0800 | [diff] [blame] | 149 |   struct mytype *data = mysearch(&mytree, "walrus"); | 
| Rob Landley | c742b53 | 2007-02-10 01:46:20 -0800 | [diff] [blame] | 150 |  | 
 | 151 |   if (data) { | 
| figo.zhang | 27af1da | 2009-04-17 10:58:48 +0800 | [diff] [blame] | 152 |   	rb_erase(&data->node, &mytree); | 
| Rob Landley | c742b53 | 2007-02-10 01:46:20 -0800 | [diff] [blame] | 153 |   	myfree(data); | 
 | 154 |   } | 
 | 155 |  | 
 | 156 | To replace an existing node in a tree with a new one with the same key, call: | 
 | 157 |  | 
 | 158 |   void rb_replace_node(struct rb_node *old, struct rb_node *new, | 
 | 159 |   			struct rb_root *tree); | 
 | 160 |  | 
 | 161 | Replacing a node this way does not re-sort the tree: If the new node doesn't | 
 | 162 | have the same key as the old node, the rbtree will probably become corrupted. | 
 | 163 |  | 
 | 164 | Iterating through the elements stored in an rbtree (in sort order) | 
 | 165 | ------------------------------------------------------------------ | 
 | 166 |  | 
 | 167 | Four functions are provided for iterating through an rbtree's contents in | 
 | 168 | sorted order.  These work on arbitrary trees, and should not need to be | 
 | 169 | modified or wrapped (except for locking purposes): | 
 | 170 |  | 
 | 171 |   struct rb_node *rb_first(struct rb_root *tree); | 
 | 172 |   struct rb_node *rb_last(struct rb_root *tree); | 
 | 173 |   struct rb_node *rb_next(struct rb_node *node); | 
 | 174 |   struct rb_node *rb_prev(struct rb_node *node); | 
 | 175 |  | 
 | 176 | To start iterating, call rb_first() or rb_last() with a pointer to the root | 
 | 177 | of the tree, which will return a pointer to the node structure contained in | 
 | 178 | the first or last element in the tree.  To continue, fetch the next or previous | 
 | 179 | node by calling rb_next() or rb_prev() on the current node.  This will return | 
 | 180 | NULL when there are no more nodes left. | 
 | 181 |  | 
 | 182 | The iterator functions return a pointer to the embedded struct rb_node, from | 
 | 183 | which the containing data structure may be accessed with the container_of() | 
 | 184 | macro, and individual members may be accessed directly via | 
 | 185 | rb_entry(node, type, member). | 
 | 186 |  | 
 | 187 | Example: | 
 | 188 |  | 
 | 189 |   struct rb_node *node; | 
 | 190 |   for (node = rb_first(&mytree); node; node = rb_next(node)) | 
| Wang Tinggong | 1903423 | 2009-05-14 11:00:20 +0200 | [diff] [blame] | 191 | 	printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); | 
| Rob Landley | c742b53 | 2007-02-10 01:46:20 -0800 | [diff] [blame] | 192 |  | 
| Pallipadi, Venkatesh | 17d9ddc | 2010-02-10 15:23:44 -0800 | [diff] [blame] | 193 | Support for Augmented rbtrees | 
 | 194 | ----------------------------- | 
 | 195 |  | 
 | 196 | Augmented rbtree is an rbtree with "some" additional data stored in each node. | 
 | 197 | This data can be used to augment some new functionality to rbtree. | 
 | 198 | Augmented rbtree is an optional feature built on top of basic rbtree | 
 | 199 | infrastructure. rbtree user who wants this feature will have an augment | 
 | 200 | callback function in rb_root initialized. | 
 | 201 |  | 
 | 202 | This callback function will be called from rbtree core routines whenever | 
 | 203 | a node has a change in one or both of its children. It is the responsibility | 
 | 204 | of the callback function to recalculate the additional data that is in the | 
 | 205 | rb node using new children information. Note that if this new additional | 
 | 206 | data affects the parent node's additional data, then callback function has | 
 | 207 | to handle it and do the recursive updates. | 
 | 208 |  | 
 | 209 |  | 
 | 210 | Interval tree is an example of augmented rb tree. Reference - | 
 | 211 | "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. | 
 | 212 | More details about interval trees: | 
 | 213 |  | 
 | 214 | Classical rbtree has a single key and it cannot be directly used to store | 
 | 215 | interval ranges like [lo:hi] and do a quick lookup for any overlap with a new | 
 | 216 | lo:hi or to find whether there is an exact match for a new lo:hi. | 
 | 217 |  | 
 | 218 | However, rbtree can be augmented to store such interval ranges in a structured | 
 | 219 | way making it possible to do efficient lookup and exact match. | 
 | 220 |  | 
 | 221 | This "extra information" stored in each node is the maximum hi | 
 | 222 | (max_hi) value among all the nodes that are its descendents. This | 
 | 223 | information can be maintained at each node just be looking at the node | 
 | 224 | and its immediate children. And this will be used in O(log n) lookup | 
 | 225 | for lowest match (lowest start address among all possible matches) | 
 | 226 | with something like: | 
 | 227 |  | 
 | 228 | find_lowest_match(lo, hi, node) | 
 | 229 | { | 
 | 230 | 	lowest_match = NULL; | 
 | 231 | 	while (node) { | 
 | 232 | 		if (max_hi(node->left) > lo) { | 
 | 233 | 			// Lowest overlap if any must be on left side | 
 | 234 | 			node = node->left; | 
 | 235 | 		} else if (overlap(lo, hi, node)) { | 
 | 236 | 			lowest_match = node; | 
 | 237 | 			break; | 
 | 238 | 		} else if (lo > node->lo) { | 
 | 239 | 			// Lowest overlap if any must be on right side | 
 | 240 | 			node = node->right; | 
 | 241 | 		} else { | 
 | 242 | 			break; | 
 | 243 | 		} | 
 | 244 | 	} | 
 | 245 | 	return lowest_match; | 
 | 246 | } | 
 | 247 |  | 
 | 248 | Finding exact match will be to first find lowest match and then to follow | 
 | 249 | successor nodes looking for exact match, until the start of a node is beyond | 
 | 250 | the hi value we are looking for. |