CPS222 Lecture: Maps; Binary Search Trees Last revised 1/25/2013
Objectives
1. To review the general concept of a map
2. To define binary search trees
3. To show how to perform operations on BST's
Materials
1. Code for BST algorithms to project
I. Introduction - Maps
- ------------ - ----
A. One kind of data structure that shows up in many places is some form of
search structure, or map. Conceptually, such a structure is a
collection of key, value pairs that can be accessed by key.
B. Such a structure typically supports operations for insertion, lookup, and
removal of entries - though in somoe problems the contents of the map
may be fixed so that only lookup is needed (in which case a different
implementation may be used. These operations take the following form:
1. Insertion:
__________________
Key, value | Map |
----------> | (key,value pairs)|
|__________________|
2. Lookup:
___________
Key | Map | Value
----------> | | ------->
|___________|
3. Deletion: ____________________________
| Map |
----------> | (key and its value removed)|
|____________________________|
C. There are actually quite a number of ways that such a structure can be
implemented.
1. Pile (unordered array).
Since all operations are O(n), suitable only for small sizes - where
its simplicity may actually make it desirable
2. Ordered array
Insertion and removal are O(n), but lookup is O(log n) since binary
search can be used. Suitable only when contents are unchanging (e.g.
table of keywords) - where its simplicity may make it desirable.
3. Linked list - unordered
Insertion is O(1), but lookup is O(n). Deletion is O(n) to find the
"victim" but then only O(1) to actually remove it. Suitable only
in situations where insertion is the dominant operation (e.g.
archival storage of information that is seldom actually referenced)
4. Binary search tree - we will discuss today
5. Hash table - we will discuss later in the week
II. Binary Search Trees
-- ------ ------ -----
A. One of the two main ways to implement a map of significant size where
modification is needed is to use a binary search tree.
B. Definition: a binary search tree is a binary tree in which each node
contains a value (called a key) that is a member of a well-ordered
set. Further, if p is a poiner to a node, then p -> _key >= every key
in the node's left subtree, and and p -> _key <= every key in the
node's right subtree.
C. Observe: if one traverses a binary search tree in inorder, the nodes
are visited in ascending order of the keys.
Ex: DOG
/ \
BISON FOX
/ \
AARDVARK CAT
is a binary search tree. Its inorder traversal is:
AARDVARK BISON CAT DOG FOX
II. Operations on Binary Search Trees
-- ---------- -- ------ ------ -----
A. The utility of binary search trees comes from the fact that the
operations of insert, lookup, and delete the node containing a certain
key all take time proportional to the height of the tree.
1. If the tree is well balanced, then its height will be proportional to
the logarithm of the number of nodes.
a. Observe that, in a perfect binary tree, there are twice as many
nodes at each level as there are at the preceeding level (since each
node has two children.) Thus, the number of nodes in the
tree grows as 2^height - which makes the height proportional
to log number of nodes. (You will develop a more formal proof
of this for a homework.)
b. If keys are inserted into a binary tree in random order, the
resultant tree will not be perfect, of course; but the height will
still be proportional to log n. (This can be shown
experimentally)
2. To see the utility of this, we can compare the average number of steps
needed for various operations on various search structures, assuming
that, in each case, the structure contains 1000 elements:
structure insert delete lookup
pile 1 500 500
ordered array 500 500 10 [binary search]
(unordered) 1 500 500
linked list
binary search 10 10 10
tree - if
balanced
B. Algorithms for binary search trees:
1. Finding a node containing a given key:
PROJECT Code for lookup (recursive and non-recursive versions)
Observe: This algorithm (in either form), requires a number of steps
proportional to the height of the tree.
2. Inserting a new key - simplest form:
PROJECT Code for insert (recursive and non-recursive versions)
a. Observe: This algorithm (in either form), requires a number of steps
proportional to the height of the tree.
b. Observe that this insertion algorithm, while very simple, could
lead to a highly non-optimal tree.
Ex: Consider what happens if keys are presented in reverse order:
FOX DOG CAT BISON AARDVARK
But note that the same thing happens when they are presented in
forward order!
AARDVARK BISON CAT DOG FOX
c. When we come to balanced binary search trees in a week or so, we
will see that there are several relatively simple way to avoid such
problems - leading to the ability to guarantee that the height of
tree will never be more than a fixed (small) multiple of log n.
3. Deletion: This is a bit more complex than the other two operations.
a. If the node we are removing has no children, it can be deleted
and the pointer to it in its parent can be set to NULL.
b. If the node has one child, that one child can become the child
of the parent of the removed node (the grandparent adopts the
grandchild)
c. But if the node being removed has two children, life is more
complex. Our basic goal is to guarantee that the resultant
tree has the same inorder traversal as the original tree - minus
the removed node.
Observe: let the inorder traversal of the original tree be as
follows, where D is the node being removed, P is its
inorder predecessor, and S is its inorder successor:
... P D S ...
what we want is a tree that traverses as follows:
... P S ...
Observe: P is in D's left subtree, and S is in D's right
subtree.
Observe: P cannot have a right child, and S cannot have a
left child. (D is the inorder successor of P, and is
above P in the tree. If P had an rchild, P's inorder
successor would lie in P's right subtree. A similar
argument holds for S)
Therfore: what we can do is arbitrarily choose either P or S,
copy its data up to node D, and then remove P or S
as the case may be. (Since P and S have a maximum
of one child, removing either is less difficult.)
PROJECT Code for remove (recursive implementation only)
Observe: This algorithm requires a number of steps proportional to the
height of the tree.