CPS222 Lecture: Introduction to Trees and Forests
Last revised 1/18/2013
Objectives:
1. To define "tree" and "forest"
2. To introduce basic operations on trees (e.g. traversals)
3. To show how trees and forests can be represented as binary trees
Materials:
1. Excerpts from an "array of pointers to children" implementation (to project)
2. Excerpts from an "oldest child/next sibling" representation (to project)
I. Introduction
 
A. Our discussion of data structures has focussed on sequential structures
(arrays, stacks, queues, lists etc.). Now we want to move to a
consideration of branching structures, in which each element of the
structure can have more than one "successor".
B. The most general sort of branching structure is the graph, which we shall
consider later. First, though, we want to give considerable attention to
a particularly useful class of branching structures: trees.
C. Definition: A tree is a set of nodes, consisting of a special nodecalled
the root  and 0 or more disjoint subsets, each of which is a tree.
1. ex: A
/  \
B C E
 / \
D F G

H
 the set of nodes A .. H is a tree. A is the root, and the
subtrees are B, C .. D, and E .. H.
in the subtree B, B is the root and there are no subtrees.
in the subtree C..D, C is the root, E is the subtree. E in turn
is the root of a tree with no subtrees
in the subtree E..H, E is the root, F and G are the roots of two
subtrees, one of which (F) has no subtrees of its own, and the
other of which (G) has the subtree H.
2. Note well the insistence that the subtrees be disjoint. For
example:
A
/ \
B C
\ / \
D E
is not a tree.
3. This definition differs slightly from the one in the book  though
it is basically saying the same thing
a. A tree cannot be empty  it must at least have a root node.
b. The first of of the two definitions in the book was in terms of
the parent relationship, rather than subtree. (But the book also
gave a second definition like the one above.)
D. Some terminology:
1. Tree terminology is borrowed from two portions of the natural world:
a. Wood type trees: we speak of the "root" of a tree and of its
"leaves". We have already defined the notion of "root" (but
notice that we draw it on the top, not on the bottom!) A leaf of
a tree is the root of a (sub)tree that has no subtrees of its own.
b. Geneaological trees (family trees):
(1) If A is the root of a tree and B is the root of one of its
subtrees, then we say that A is the "father" or "parent" of B,
and B is the "son" or "child" of A. In the above:
 A is the parent of B, C, and E. B,C, and E are children of A.
 C is the parent of D, D is the child of C.
 E is the parent of F and G; F and G are children of E.
(2) We can carry this further, speaking of A as the grandparent of
D etc. In general, we say that A is the "ancestor" of H and
H is the "descendant" of A if H is in one of the subtrees of A.
In the example above, B, C, D, E, F, G, and H are all
descendants of A.
(3) If two nodes are the children of the same parent), we say that
they are "brothers" or "siblings" or (sometimes) "twins". In
the above, B, C, and E are siblings, as are F and G.
(4) We could go farther and use terms like "uncle"  but we
seldom do.
2. Additional terminology:
a. The leaves of a tree are sometimes also called "external" or
"terminal" nodes, and the nonleaf nodes can be called "internal"
or "nonterminal" nodes.
b. The "degree" of a node is the number of children it has. (Note that
we can then define a leaf as a node with degree 0.) The degree of a
tree is the maximum degree of any of its nodes. In the above
example, the degree of A is three  and this also happens to be the
degree of the whole tree, since the next highest degree is two. It
need not always be the case that the root has the highest degree.
c. A "path" from the root of a tree to a node is a sequence of nodes
N .. N such that N is the root, N is the leaf, and N is the
1 hi 1 h i
parent of N for all i, 1 <= i < h. The length of a path is
i+1
the number of EDGES traversed  i.e. one less than the number of
nodes on the path.
d. The "depth" or "level" of a node can be defined as follows:
 The depth level of a node is its distance from the root  the
length of a path from the root to it.
or  equivalently:
 The depth (level) of the root of a tree is zero.
 The depth (level) of any other node is 1 + the depth (level) of
its parent.
 In the above: A is at depth 0, B, C, and E at depth 1, D, F, and
G at depth 2, and H at depth 3.
But note: Some authors define the depth of the root of a tree to be
1, not 0. The effect, in the above example, would to make each
value one greater.
e. The "height" of a node is the length of the longest path from
that node to a leaf. This can be done by counting nodes or edges 
which leads to two different answers that differ by 1.
 If we count edges, then leaf nodes have height 0.
 If we count nodes, leaf nodes have a height of 1.
i. In either case, the height of any other node is 1 + the maximum
of the heights of its children. The height of a tree is defined
to be the height of the root.
ii. The book uses the "edges" form of definition, which leads to a
single node tree (just a root) having a height of 0. The "nodes"
form of definition is more intuitive, I think. For example, a
single node tree would have a height of 1.
iii. I'll use the latter definition in subsequent lectures.
3. In drawing our tree examples, there has been an implicit lefttoright
ordering of the children of a given parent. In an actual tree, this
ordering may or may not be an important. An "ordered" tree is one
in which there is such an ordering imposed on the children of the
same parent; in an "unordered" tree, no such relationship exists.
a. Note that any practical scheme for representing a tree imposes an
order.
b. In our further discussion, we will work with ordered trees unless
we explicitly say otherwise  though most of what we say about
ordered trees applies equally to unordered trees.
c. Sometimes, when we are thinking of a tree as an ordered tree,
we will say of two siblings that the first is "older" than
the second if the first is to the left of the second in our
drawing. We can then use the term "oldest child" to refer to
the leftmost child of a node.
Example: In the tree we have been using for examples, B is
the oldest child of A, C the oldest, and E the youngest.
E. To further generalize, we can define the concept of a "forest" as a
set of 0 or more disjoint trees.
1. Example:
B C E
 / \
D F G

H
2. Observe: we can convert a forest to a tree by adding a single node
to serve as the root of a tree in which each of the original trees
is a subtree:
ex: A
/  \
B C E
 / \
D F G

H
3. Conversely, deleting the root from a tree leaves behind a forest
consisting of its subtrees. (Obviously, this is how we got our
forest from our original tree.)
F. In writing about trees, we can adopt one of several systems of notation:
1. The graphlike drawings we have been using thus far.
2. Indentation:
ex: Our original tree:
A
B
C
D
E
F
G
H
ex: Our forest:
B
C
D
E
F
G
H
3. Parentheses. ex: our tree
A(B, C(D), E(F, G(H)))
G. Some uses of trees: Observe that a tree is a fundamentally hierarchical
structure. Thus, a tree is appropriate to model any reality that
exhibits hierarchy:
1. File system directories are often treestructured.
2. Geneaological trees of all sorts: family relationships among
individuals, tribes, languages etc.
3. Classifications systems:
a. Taxonomic classification of plants and animals.
b. Dewey decimal (or Library of Congress) classification of books.
4. Breakdown of a manufactured product into subassemblies, each of
turn consists of subsubassemblies etc. down to the smallest
components.
5. Structure of a program  main routine is the root, procedures it
contains are subtrees, each of which contains nested procedure
definitions etc.
H. Trees are also very useful for information storage and retrieval
situations such as symbol tables, even though hierarchy may not be
involved.
II. Operations on trees
   
A. As with any flexible data structure, there are many possible operations
we could define on trees. Certainly, we want a create operation  but
note that there is no such thing as an empty tree! So when we create
a tree, we create a tree having at least one node  the root.
B. The operation of insertion into a tree is certainly important, but
depends heavily on the principle by which the nodes are organized.
We defer discussion of insertion and deletion to discussion of various
special kinds of tree organized on various principles.
C. One class of operations that can be defined for all kinds of tree is
traversal. By "traversal", we mean the act of systematically
"visiting" all of the nodes to perform some operation on them:
1. Printing out the contents of all of the nodes, or performing some other
operation on all the nodes, involves a traversal.
2. Unless the tree is ordered somehow on the basis of some key,
searching for a node containing a given value would involve a
traversal (though in practice trees that are to be searched are
usually structured in such a way as to avoid this.)
D. One issue that arises in connection with traversal is the order of
traversal. Two orders are of particular importance:
1. Preorder traversal: Visit the root of the tree
Traverse each subtree in turn in preorder
Example on the above: A B C D E F G H
2. Postorder traversal: Traverse each subtree in postorder
Visit the root
Example on the above: B D C F H G E A
E. Of lesser importance is level order traversal: visit all the nodes
on level zero, then all on level one etc.
Example on the above: A B C E D F G H
F. The above operations can be defined on a forest by mentally adding a
root which is ignored when it comes time to visit it.
III. Representing Trees and Forests
    
A. We have noted that a forest can be converted to a tree by adding a
root. Thus we focus on representing trees  to represent a forest,
simply include a "root" as a header.
B. One method is to use a linked representation in which each node contains
pointers to its children. This means that when we define the data type
for a node, the degree of the tree determines the number of pointer
fields needed. Pointer fields in a given node that are not needed can
be set to null.
PROJECT: Array of pointers to children example  class Node
1. Now, for example, we could implement operations on this tree as follows:
a. preorder traversal:
PROJECT: preorder
b. postorder traversal could be written similarly. What changes would
be needed to turn the given preorder code into postorder?
ASK
 Change the name of the function!
 Do the visit AFTER the recursive calls
c. Reading a tree in from a text file. Assume that the nodes of a
tree have been written out, one node to a line, in preorder.
Assume each line contains the contents of the node and the number
of its children.
ex: The tree A
/  \
B C D
/\
E F
would be stored as: A 3
B 0
C 0
D 2
E 0
F 0
PROJECT readTree code
2. However, this representation runs into a severe efficiency problem if
the degree of the tree is large.
a. Thm: For a tree of degree d with n nodes, represented using the
array of pointers to children representation, we will always have
n*(d1) + 1 NULL pointers stored in the nodes.
Pf: Each of the n nodes has room for d pointers  or n*d pointers
in all. Each node (except the root) is pointed to by exactly
one of these. So n1 pointers are used to point to other
nodes, leaving n*d  (n1) = n*(d1) + 1 NULL.
b. For example, for a tree of degree 10 with 100 nodes, we waste 901
pointers.
C. An alternate representation can be arrived at by using a linked list
representation for the children of a node.
1. Each node holds two pointers. One points to its oldest child.
The other points to its next sibling (next younger node with the
same parent.)
2. Such a tree is actually a binary tree. A binary tree is either
empty, or it consists of a root and exactly two disjoint sets of
nodes  designated left child and right child, each of which is a
binary tree. We will say more about binary trees in the next lecture 
for now note that a binary tree is a different thing from a tree!
3. The transformation from a general tree into an equivalent binary
tree (oldest child/next sibling representation) can be done
recursively, as follows:
a. To transform a general tree rooted at a node A to its equivalent
binary tree:
 create a binary tree whose root is A.
 transform the leftmost subtree of A in the general tree, and make
this the left subtree of A in the binary tree.
 transform the next sibling of A in the general tree, and make this
the right subtree of A in the binary tree..
b. ex: our original tree:
A
/
B
\
C
/ \
D E
/
F
\
G
/
H
c. Note that you can visualize the shape of the original tree by
mentally rotating the binary equivalent 45 degrees counterclockwise.
d. The same method can be applied to a forest  the right subtree of
the binary equivalent of the root of one of the trees is the
transformed version of the next tree in the forest. We can see
what this would look like for our example forest by just deleting
the A node from the above tree.
PROJECT: Code for Oldest child/next sibling representation  NODE class
4. Note that this representation dramatically decreases the number
of NULL pointers. If we used the same reasoning we used previously,
an nnode tree would need just n + 1 NULL pointers.
5. Performing traversals on a general tree represented by an equivalent
binary tree.
a. Preorder traversal of the general tree is accomplished by preorder
traversal of the transformed tree.
ex: preorder traversal of the above binary tree: A B C D E F G H
PROJECT: Code for preorder
b. What about postorder traversal? How would this be done?
ASK
i. Postorder traversal of the general tree is accomplished by
INORDER traversal of the transformed tree.
Inorder traversal: traverse the left subtree in inorder
visit the root
traverse the right subtree in inorder
ii. ex: the above: B D C F H G E A
iii. This works because:
 The left subtree of any node in the transformed tree contains all
the nodes that were descendants of that node in the original
tree. These should be visited first.
 The right subtree of any node in the transformed tree contains
all the nodes that were right siblings (or descendants thereof)
of the node in the original tree. These should be visited after
the node.
iv. What would need to be done to change the example code for preorder
just projected to do this?
ASK
 Change the name
 Do the visit between subtrees
c. Postorder traversal of the transformed tree has no relationship to
any meaningful operation on the original tree.
d. An equivalent to our ReadTree procedure defined above can also be done
PROJECT: Code for readTree