| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | The prio_tree.c code indexes vmas using 3 different indexes: | 
 | 2 | 	* heap_index  = vm_pgoff + vm_size_in_pages : end_vm_pgoff | 
 | 3 | 	* radix_index = vm_pgoff : start_vm_pgoff | 
 | 4 | 	* size_index = vm_size_in_pages | 
 | 5 |  | 
 | 6 | A regular radix-priority-search-tree indexes vmas using only heap_index and | 
 | 7 | radix_index. The conditions for indexing are: | 
 | 8 | 	* ->heap_index >= ->left->heap_index && | 
 | 9 | 		->heap_index >= ->right->heap_index | 
 | 10 | 	* if (->heap_index == ->left->heap_index) | 
 | 11 | 		then ->radix_index < ->left->radix_index; | 
 | 12 | 	* if (->heap_index == ->right->heap_index) | 
 | 13 | 		then ->radix_index < ->right->radix_index; | 
 | 14 | 	* nodes are hashed to left or right subtree using radix_index | 
 | 15 | 	  similar to a pure binary radix tree. | 
 | 16 |  | 
 | 17 | A regular radix-priority-search-tree helps to store and query | 
 | 18 | intervals (vmas). However, a regular radix-priority-search-tree is only | 
 | 19 | suitable for storing vmas with different radix indices (vm_pgoff). | 
 | 20 |  | 
 | 21 | Therefore, the prio_tree.c extends the regular radix-priority-search-tree | 
 | 22 | to handle many vmas with the same vm_pgoff. Such vmas are handled in | 
 | 23 | 2 different ways: 1) All vmas with the same radix _and_ heap indices are | 
 | 24 | linked using vm_set.list, 2) if there are many vmas with the same radix | 
 | 25 | index, but different heap indices and if the regular radix-priority-search | 
 | 26 | tree cannot index them all, we build an overflow-sub-tree that indexes such | 
 | 27 | vmas using heap and size indices instead of heap and radix indices. For | 
 | 28 | example, in the figure below some vmas with vm_pgoff = 0 (zero) are | 
 | 29 | indexed by regular radix-priority-search-tree whereas others are pushed | 
 | 30 | into an overflow-subtree. Note that all vmas in an overflow-sub-tree have | 
 | 31 | the same vm_pgoff (radix_index) and if necessary we build different | 
 | 32 | overflow-sub-trees to handle each possible radix_index. For example, | 
 | 33 | in figure we have 3 overflow-sub-trees corresponding to radix indices | 
 | 34 | 0, 2, and 4. | 
 | 35 |  | 
 | 36 | In the final tree the first few (prio_tree_root->index_bits) levels | 
 | 37 | are indexed using heap and radix indices whereas the overflow-sub-trees below | 
 | 38 | those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are | 
 | 39 | indexed using heap and size indices. In overflow-sub-trees the size_index | 
 | 40 | is used for hashing the nodes to appropriate places. | 
 | 41 |  | 
 | 42 | Now, an example prio_tree: | 
 | 43 |  | 
 | 44 |   vmas are represented [radix_index, size_index, heap_index] | 
 | 45 |                  i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff] | 
 | 46 |  | 
 | 47 | level  prio_tree_root->index_bits = 3 | 
 | 48 | ----- | 
 | 49 | 												_ | 
 | 50 |   0			 				[0,7,7]					 | | 
 | 51 |   							/     \					 | | 
 | 52 | 				      ------------------       ------------			 |     Regular | 
 | 53 |   				     /					   \			 |  radix priority | 
 | 54 |   1		 		[1,6,7]					  [4,3,7]		 |   search tree | 
 | 55 |   				/     \					  /     \		 | | 
 | 56 | 			 -------       -----			    ------       -----		 |  heap-and-radix | 
 | 57 | 			/		    \			   /		      \		 |      indexed | 
 | 58 |   2		    [0,6,6]	 	   [2,5,7]		[5,2,7]		    [6,1,7]	 | | 
 | 59 | 		    /     \		   /     \		/     \		    /     \	 | | 
 | 60 |   3		[0,5,5]	[1,5,6]		[2,4,6]	[3,4,7]	    [4,2,6] [5,1,6]	[6,0,6]	[7,0,7]	 | | 
 | 61 | 		   /			   /		       /		   		_ | 
 | 62 |                   /		          /		      /					_ | 
 | 63 |   4	      [0,4,4]		      [2,3,5]		   [4,1,5]				 | | 
 | 64 |   		 /			 /		      /					 | | 
 | 65 |   5	     [0,3,3]		     [2,2,4]		  [4,0,4]				 |  Overflow-sub-trees | 
 | 66 |   		/			/							 | | 
 | 67 |   6	    [0,2,2]		    [2,1,3]							 |    heap-and-size | 
 | 68 |   	       /		       /							 |       indexed | 
 | 69 |   7	   [0,1,1]		   [2,0,2]							 | | 
 | 70 |   	      /											 | | 
 | 71 |   8	  [0,0,0]										 | | 
 | 72 |   												_ | 
 | 73 |  | 
 | 74 | Note that we use prio_tree_root->index_bits to optimize the height | 
 | 75 | of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is | 
 | 76 | set according to the maximum end_vm_pgoff mapped, we are sure that all | 
 | 77 | bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore, | 
 | 78 | we only use the first prio_tree_root->index_bits as radix_index. | 
 | 79 | Whenever index_bits is increased in prio_tree_expand, we shuffle the tree | 
 | 80 | to make sure that the first prio_tree_root->index_bits levels of the tree | 
 | 81 | is indexed properly using heap and radix indices. | 
 | 82 |  | 
 | 83 | We do not optimize the height of overflow-sub-trees using index_bits. | 
 | 84 | The reason is: there can be many such overflow-sub-trees and all of | 
 | 85 | them have to be suffled whenever the index_bits increases. This may involve | 
 | 86 | walking the whole prio_tree in prio_tree_insert->prio_tree_expand code | 
 | 87 | path which is not desirable. Hence, we do not optimize the height of the | 
 | 88 | heap-and-size indexed overflow-sub-trees using prio_tree->index_bits. | 
 | 89 | Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits | 
 | 90 | of size_index. This may lead to skewed sub-trees because most of the | 
 | 91 | higher significant bits of the size_index are likely to be be 0 (zero). In | 
 | 92 | the example above, all 3 overflow-sub-trees are skewed. This may marginally | 
 | 93 | affect the performance. However, processes rarely map many vmas with the | 
 | 94 | same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally | 
 | 95 | do not require overflow-sub-trees to index all vmas. | 
 | 96 |  | 
 | 97 | From the above discussion it is clear that the maximum height of | 
 | 98 | a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG. | 
 | 99 | However, in most of the common cases we do not need overflow-sub-trees, | 
 | 100 | so the tree height in the common cases will be prio_tree_root->index_bits. | 
 | 101 |  | 
 | 102 | It is fair to mention here that the prio_tree_root->index_bits | 
 | 103 | is increased on demand, however, the index_bits is not decreased when | 
 | 104 | vmas are removed from the prio_tree. That's tricky to do. Hence, it's | 
 | 105 | left as a home work problem. | 
 | 106 |  | 
 | 107 |  |