CPS222 Lecture: Threaded Binary Trees last revised 1/25/2013 Objectives: 1. To introduce inorder-threading of binary trees, and to show how inthreaded trees can be traversed easily. 2. To discuss threading schemes based on pre or post-order traversal instead Materials: 1. Code for recursive and non-recursive versions of inorder traversal (excerpted from prior lecture) to project 2. Transparency + Handout of a completely in-threaded binary tree I. Inorder Threading of Binary Trees - ------- --------- -- ------ ----- A. When we talked about preorder, inorder, and postorder traversal of binary trees earlier, we saw that a stack is always needed to accomplish the traversal - either explicitly or implicitly because of recursion. 1. PROJECT code for inorder traversal (recursive and non-recursive) excerpted from earlier lecture. Observe that implementing the non-recursive version would use a stack whose size is equal to the height of the tree; the recursive version would use an implicit stack of the same size. 2. Since these traversals are used often, we would like to avoid the space and time overhead of the stack if we can. It turns out that there is a simple tree representation that allows us to do this, while also allowing us to define an iterator for the tree that allows us to easily move from one node to the next in the appropriate order. B. Consider inorder traversal. We will define the inorder successor of a node n as being the next node that will be visited in doing an inorder traversal of the tree - or some sentinel value (e.g. NULL or a pointer back to the header node if there is one) if the node is the last one visited in inorder. 1. Suppose we were able to define a function /* Return a pointer to the inorder successor of p */ Node * insucc(Node * p); Suppose, further, that we arranged for there to be a header node for the tree, whose insucc is the first node in the inorder traversal order. if so, we could implement inorder traversal of a tree as follows, without the use of a stack or recursion: p = insucc(header); while (p != header) { // Do whatever it means to visit the data at this node p = insucc(p); } 2. If a node has a non-NULL right child, insucc is easy to define: Node * c = p -> _rchild; while (c -> _lchild != NULL) c = c -> lchild; return c; 3. However, if a node has a NULL right child, then its inorder successor is "above" it in the tree. This is what the stack does for us - note that, in the non-recursive inorder traversal algorithm, when p -> _rchild is NULL then we fall through the while loop and pop the stack again, getting the node that was the parent of p. If its right child is also NULL we end up popping the stack again, going further up the tree. 4. We have also noted that a binary tree with n nodes contains n+1 NULL pointers. It would be nice to do something useful with these. One thing we could do with a NULL rchild pointer is to use it to point to the inorder successor of the node. We will call such a pointer a thread, and a tree containing such pointers a right-inthreaded binary tree. 5. Of course, we must have some way of tagging the pointers to distinguish between a regular child pointer and a thread. Since this requires just one bit, it can generally be done at no additional cost by using a bit somewhere that is otherwise unused. a. For example, on some machines a pointer must be even, since words in memory begin on even address boundaries. Therefore, the low-order bit of a pointer must be zero. We can differentiate threads from regular pointers by setting this bit to 1. b. On most machines, the number of bits used to store an address far exceeds the number needed to represent the range of addresses needed for the physical memory installed; hence, the high order bit is normally 0. We can differentiate threads from regular pointers by setting this bit to 1. C. We can now implement insucc - and hence inorder traversal - as follows: Node * insucc(Node * p) /* returns a pointer to the inorder successor of p */ { if (isthread(p -> _rchild) return makepointer(p -> _rchild); else { Node * c = p -> _rchild; while (c -> _lchild != NULL) c = c -> _lchild; return c; } } - where isthread tests the extra bit of a pointer to see if it is a regular pointer or a thread, and makepointer clears the extra bit so that the thread can be used like a poionter. 1. What is the time efficiency of this algorithm? Clearly, any one application of insucc can require time proportional to the height of the tree. But what is of more interest is the average cost of applying insucc n times in order to visit all the nodes of the tree. We call the average cost per use, averaged over all cases, the AMORTIZED COST. 2. Note that a tree of n nodes contains n lchild pointers and n rchild pointers. In the process of traversing the tree, insucc follows each non-NULL lchild pointer exactly once, and each rchild pointer (normal or thread) exactly once. Therefore, the total time for traversing a right-intreaded tree of n nodes in inorder is O(n)! This, of course, is optimal - since we must visit all n nodes. 3. From this, it follows that the amortized cost of one use is O(n/n) = O(1). D. Note that this trick only made use of the NULL rchild pointers. What about the NULL lchild pointers? Suppose we define inpred as the inorder predecessor. By symmetry, it turns out that inpred looks like insucc with lchild and rchild pointers interchanged. Thus, we can replace NULL lchild pointers by threads to the inorder predecessor. If we do so, then we can perform inorder traversal in either direction without the use of recursion or a stack. 1. Such a tree is called completely inthreaded. 2. A tree which contained only left-threads would allow reverse inorder traversal only. Such a tree is called left-inthreaded. 3. Note that threading is possible with any kind of binary tree. (We will deal with threading of a binary search tree for a project, but a threaded tree does not have to be a binary search tree.) E. A completely inthreaded binary tree might look like the following. TRANSPARENCY + HANDOUT Note that we make use of a header node to simplify some of the algorithms to follow. The header convention is this: 1. If the tree is empty, then the header's left child is a thread back to the header. Otherwise, it points to the root of the tree. 2. The header's right child is a pointer (not a thread) to itself. 3. The first node (in inorder) in the tree has an lchild thread back to the header. (Note that our insertion algorithm will ensure this.) Likewise, the last node has an rchild thread back to the header. (Insert will also do this.) 4. Note how this choice causes our insucc algorithm, when applied to the header, to yield the first node of the tree. Our inpred algorithm also works correctly. Finally, both algorithms return a pointer to the header when applied to the first/last node in the tree (as the case may be.) F. How can be build such a tree? If we always insert new nodes in place of previously NULL pointers, then the following approach will work: 1. If the new node is the lchild of its parent, then it lies between its parent's inpred and its parent in inorder traversal. Therefore, let the lchild of the new node be the original lchild (thread) of the parent, and let the rchild of the new node be a thread to its parent. 2. If the new node is the rchild of its parent, then it lies between its parent and its parent's insucc in inorder traversal. Therefore, let the rchild of the new node be the original rchild (thread) of the parent, and let the lchild of the new node be a thread to its parent. G. As a further consideration, note that while threads as we have implemented them are based on inorder traversal, they can assist the other traversals as well: 1. Preorder - define the function presucc. Note that: a. If a node has an lchild, then its lchild is its presucc. Ex: node 1 in diagram. b. Otherwise, if it has an rchild, then its rchild is its presucc. Ex: node 19 c. If it has no children, then it is the last node to be visited in preorder in the left subtree of some node Q. Let Q be the nearest such node having a non-empty right child. (If all else fails, the header qualifies.) Then Q's rchild is the presucc. Ex: node 17 Q is 2 presucc is 5 27 1 3 d. From a node P having no actual rchild, this node Q can be found as follows: i. Follow P's rchild thread to a node above it. Clearly, P is in the left subtree of this node. If this node has a non-thread rchild, then it is node Q. ii. If this node's rchild is a thread, then repeat the process as many times as necessary until a node is found having a non-thread rchild. This is node Q. iii. Having found this node Q, P's presucc is Q's rchild. Time complexity for a complete traversal: note that each non-thread lchild is followed exactly once, and that each rchild is followed exactly once - therefore, the traversal is O(n), and the amortized cost of presucc is O(1). 2. Reverse preorder - define the function prepred. This is not quite as easy, since we must always go through the parent of the node. Note: a. If a node is the lchild of its parent, then its parent is its prepred. Ex: node 2. b. If a node is the rchild of its parent and the parent has no lchild, then the parent is the prepred. Ex: node 25. c. Otherwise, its prepred is the last node (in preorder) in the left subtree of its parent. This can be found by going down the left subtree of the parent as far as possible - going right whenever possible, otherwise left. Ex: node 13 - prepred is 38. d. Thus, we must first define a function parent (which is useful in its own right and also for postsucc, it turns out.) For any node P, there exists a nearest ancestor Q, such that P is in its right subtree. (If all else fails, the header is such.) We can find this node by following lchild pointers until we have followed a thread. Then, if P is the rchild of Q, then Q is its parent - otherwise, we follow lchild pointers in the right subtree of Q until we hit P. ex: node 3 - Q = 1 and is its parent node 12 - Q = 1. Note that we can find the parent (node 6) by going right from Q, then continuing left. node 13 - Q = 6 and is its parent e. Given the parent function, prepred is easily defined as discussed above. Note that reverse preorder traversal using prepred will not be O(n) for the whole tree, but rather O(n*h), since parent potentially involves visiting one node on each level of the tree, and in subsequent applications of parent the same path can be retraced. Thus, the amortized cost of prepred is O(h) = O(log n) if the tree is well balanced. (But O(n) worst case). 3. Postorder traversal - define a function postsucc. a. By symmetry, this turns out to be similar to prepred, but with the roles of lchild and rchild interchanged. To find the postsucc, we first find the parent of the node in question. b. If the node is the rchild of its parent, or if it is the only child of its parent, then the parent is the postucc. Ex: nodes 3, 24. c. Otherwise, we find the first node in postorder in the right subtree of the parent. This can be found by going down the subtree as far as possible, preferring to go left whenever possible, otherwise right. Ex: node 2 - postsucc = 32. d. As with prepred, postorder traversal using postsucc is O(n*h); amortized cost of postsucc is O(h) = O(log n). e. A caution on implementation: with the previous algorithms, our header convention has worked to our advantage to produce desired results - e.g. we could apply inpred, insucc, prepred, or presucc to the header and get a correct node, and in each case applying the function to the last node would lead us back to the header. With postsucc, some special cases are needed when leaving or coming back to the header due to our trick of making the header its own right child. (However, the fact that the header is its own right child makes it easy to recognize the header.) 4. Reverse postorder traversal - define a function postpred. a. By symmetry, this is analgous to presucc, but with lchild and rchild roles reversed. Reverse postorder traversal using postpred is therefore O(n), so the amortized cost of postpred is O(1). b. As with postsucc, some special cases are needed around the header. II. Preorder and postorder threading -- -------- --- --------- --------- A. The threading scheme we have discussed has been based on the inorder traversal of the tree. However, as we have seen, the inorder threads can also be used to accomplish other traversals (though not necessarily in O(1) amortized time.) B. If some other traversal is going to be used regularly instead of inorder, then an alternate threading scheme might be considered. We could, for example, base a scheme on pre-order: 1. We might build a threading scheme on the fact that if a node has a left child, then that child is its preorder successor. If it has no left child, then its lchild pointer could be made into a thread to its pre-order successor. (In this case, the rchild pointer is used as in an unthreaded tree.) Presucc now becomes simply: if (! isthread(p -> _lchild)) return p -> _lchild; else return makepointer(p -> _lchild); 2. Alternately, we could adopt the following scheme for pre-order: a. If the node has no lchild, then make its lchild pointer a thread to its pre-order predecessor. b. If the node has no rchild, then make its rchild pointer a thread to its pre-order successor. c. This scheme, like the previous one, makes forward pre-order traversal fairly easy: if (! isthread(p -> _lchild)) return p -> _lchild; else if (! isthread(p -> _rchild)) return p -> _rchild; else return makepointer(p -> _rchild); d. Reverse pre-order is also possible with this scheme, THOUGH WE WOULD OCCASSIONALLY HAVE TO GO TO THE HEADER AND APPLY PRESUCC REPEATEDLY. (This is because a node's prepred is never below it in the tree.) C. We could also base a scheme on post-order. Unfortunately, forward post-order will always be hard, because a node's post-order successor is never below it in the tree. However, a scheme to support reverse post-order would be somewhat easier!