CPS222 Lecture: Algorithm Design Strategies Last modified 4/29/2015
Materials
1. Projectable of brute force solution to max sum problem
2. Huffman algorithm Powerpoint
3. Dynamic Programming Fibonacci program example
4. Projectable of partially filled in LCS table
5. Projectable of figure 12.2 p. 563 in Goodrich, Tamassia, Mount
6. Projectable of same figure, but showing derivation of GTTTAA
7. Projectable of example use of optimal BST algorithm on a tree of 4 keys
8. obst program to demo and project
I. Introduction
 
A. At this point in the course, we are going to shift our focus somewhat.
1. Up until now, our focus has been on learning "standard" algorithms
and their associated data structures.
2. When confronted with a problem to solve, you should always ask "can this problem
be viewed as an instance of a problem for which there exists a known algorithm?"
If the answer is "yes", then you don't have to "reinvent the wheel".
Example: We saw earlier that problems as diverse as scheduling tasks with
prerequisites, analyzing electrical circuits, and designing robust communication
or transportation networks can be solved by known graph algorithms.
3. Sometimes, though, one is confronted with a problem which does not correspond
to any previouslysolved problem. In this case, it may be necessary to develop
an algorithm to solve the problem from scratch.
4. Or, the problem may be a familiar problem for which no good algorithm
is known  e.g. it may be an instance of an NPcomplete problem. In
this case, we may need to develop an algorithm that develops an
acceptable, though perhaps not optimal solution.
Example: If we have a problem to solve that is equivalent to the
traveling salesman problem, we will not be able to find a practical
algorithm that gives us a guaranteed optimal solution; but we may be
able to develop an algorithm that gives us a solution that is close
enough to optimal for the cases we are interested in.
B. We now consider a number of strategies that can be used to tackle a
problem which does not already have a known algorithmic solution.
1. These are not solutions to a problem, but strategies to explore
when trying to find a solution.
2. Many of the "standard" algorithms that we have learned were first
discovered by someone who applied one of the strategies to the
problem in the first place!
C. For each strategy, we will consider one or more examples of algorithms
that utilize that strategy. We will see that algorithms we have already learned
exemplify the strategy we are considering, and we will also consider some new
algorithm. In all cases, though, the goal here is to understand the design
strategy behind the algorithm, not just the algorithm itself.
II. Brute Force
  
A. Given the sheer speed of a computer, it is tempting to try to solve a
problem by brute force  e.g. trying all the possibilities.
1. We saw an example of this when we first looked at algorithm anaylsis.
Remember the maximum sum problem? Our first attempt at a solution was
the brute force solution.
int naiveMaxSum(int a[], int n)
/* Naive solution to the maximum subvector sum problem */
{
int maxSum = 0;
for (int i = 0; i < n; i ++)
for (int j = 0; j < n; j ++)
{
int thisSum = 0;
for (int k = i; k <= j; k ++)
thisSum += a[k];
if (thisSum > maxSum)
maxSum = thisSum;
}
return maxSum;
}
PROJECT
2. Complexity of this solution?
ASK
theta(N^3)
3. As you recall, we developed a series of better solutions, culminating
in an theta(N) solution  which we argued is inherently the lower limit
because any solution must look at each element of the vector at least
once.
B. Of course, often we will be able to find a better solution to the problem
than sheer brute force  as was the case with the max sum problem.
C. However, this won't always be the case. There will be some problems for
which brute force is the only option.
Examples?
ASK
1. Searching an unordered list, or one that is ordered on some basis
other than the order of the search key.
The only option is the brute force one of looking at every item.
2. Problems like the traveling salesman  if we must have the absolute
best solution.
III. Greedy Algorithms
  
A. Many problems have the general form "for a given set of data, what is the
best way to ____?".
1. Examples we have considered thus far:
ASK
a. Shortest path problem in a graph
b. Minimal cost spanning tree of a graph
2. For such a problem, what we are seeking is a GLOBAL OPTIMUM 
i.e. the best overall way to solve the problem. E.g. for a minimum
cost spanning tree, we want to find the spanning tree having the lowest
overall cost, though it may include some "expensive" individual edges.
3. One way to solve such a problem would be exhaustive search  create
all the possible solutions, and then choose the cheapest one.
Unfortunately, such an approach generally has exponential cost.
B. The greedy strategy goes like this: build up an overall solution one step
at a time, by making a series of LOCALLY OPTIMAL choices.
1. Example: Djikstra's shortest path algorithm builds up the list of
shortest paths one node at a time by, at each step, choosing the
not yet known node that has the shortest known path to the starting
vertex.
2. Example: Kruskal's minimum cost spanning tree algorithm builds up
the tree one edge at a time by, at each step, adding to the tree
the lowest cost edge which does not introduce a cycle.
C. A good  and historically important  example of a greedy algorithm is
the Huffman algorithm. We will now look at it, both as an algorithm that
is interesting in its own right, and as an example of the greedy strategy.
1. One area of considerable interest in many applications is DATA
COMPRESSION  reducing the number of bits required to store a given
body of data. We consider one approach here, based on weightbalanced
binary trees, and utilizing a greedy algorithm that produces an
optimal solution.
2. Suppose you were given the task of storing messages comprised of the
7 letters AG plus space (just to keep things simple.) In the absence
of any information about their relative frequency of use, the best you
could do would be to use a three bit code  e.g.
000 = space
001 .. 111 = A .. G
3. However, suppose you were given the following frequency of usage data.
Out of every 100 characters, it is expected that:
10 are A's Note: these data are contrived!
10 are B's
5 are C's
5 are D's
30 are E's
5 are F's
5 are G's
30 are spaces
a. Using the three bit code we just considered, a typical message of
length 100 would use 300 bits.
b. Suppose, however, we used the following variablelength code instead:
A = 000 NOTE: No shorter code can be a prefix of
B = 001 any longer code. Thus, we cannot
C = 0100 use codes like 00 or 01  if we saw
D = 0101 these bits, we wouldn't know if they
E = 10 were a character in their own right or
F = 0110 part of the code for A/B or C/D.
G = 0111
space = 11
A message of length 100 with typical distribution would now need:
(10 * 3) + (10 * 3) + (5 * 4) + (5 * 4) + (2 * 30) + (5 * 4) +
(5 * 4) + (30 * 2) = 260 bits = a savings of about 13%
4. A variable length code can be represented by a decode tree, with
external nodes representing characters and internal nodes representing
a decision point at a single bit of the message  e.g.
( first bit)
/ 0 \ 1
(2nd bit) (2nd bit)
/ 0 \ 1 / 0 \ 1
(3rd bit) (3rd bit) [E] [space]
/ 0 \ 1 / 0 \ 1
[A] [B] (4th bit) (4th bit)
/ 0 \ 1 / 0 \ 1
[C] [D] [F] [G]
The optimum such tree is the one having the smallest weighted external path
length  e.g. sum of the levels of the leaves times their weights.
5. An algorithm for computing such a weightbalanced code tree is the
Huffman algorithm, discussed in the book.
a. Basic method: we work with a list of partial trees.
i. Initially, the list contains one partial tree for each character.
ii. At each iteration, we choose the two partial trees of least weight and
construct a new tree consisting of an internal node plus these two as
its children. We put this new tree back on the list, with weight equal
to the sum of its children's weights.
iii. Since each step reduces the length of the list by 1 (two
partial trees removed and one put back on), after n1
iterations we have a list consisting of a single node, which
is our decode tree.
b. Example: For the above data.
Initial list: A B C D E F G space
.10 .10 .05 .05 .30 .05 .05 .30
/ \ / \ / \ / \ / \ / \ / \ / \
Step 1  remove C, D  and add new node:
() A B E F G space
.10 .10 .10 .30 .05 .05 .30
/ \ / \ / \ / \ / \ / \ / \
C D
Step 2  remove F, G  and add new node:
() () A B E space
.10 .10 .10 .10 .30 .30
/ \ / \ / \ / \ / \ / \
F G C D
Step 3  remove A, B  and add new node:
() () () E space
.20 .10 .10 .30 .30
/ \ / \ / \ / \ / \
A B F G C D
Step 4  remove two partial trees  and add new node:
() () E space
.20 .20 .30 .30
/ \ / \ / \ / \
() () A B
/ \ / \
C D F G
Step 5  remove two partial trees  and add new node:
() E space
.40 .30 .30
/ \ / \ / \
() ()
/ \ / \
A B () ()
/ \ / \
C D F G
Step 6  remove E, space  and add new node:
() ()
.60 .40
/ \ / \
E space () ()
/ \ / \
A B () ()
/ \ / \
C D F G
Step 7  construct final tree:
()
1.00
/ \
() ()
/ \ / \
() () E space
/ \ / \
A B () ()
/ \ / \
C D F G
c. Analysis:
i. Constructing the initial list is theta(n).
ii. Transforming to a tree involves n1 (= theta(n)) iterations. On each
iteration, we scan the entire list to find the two partial trees of
least weight = theta(n)  so this process, using the simplest mechanism
for storing the list of partial trees is theta(n^2).
iii. Printing the tree is theta(n).
iv. Overall is therefore theta(n^2). However, we could reduce time to
theta(n log n) by using a more sophisticated data structure for
the "list" of partial trees  e.g. a heap based on weight.
(But given the small size of a typical alphabet, the theta(n^2)
algorithm may actually be faster.)
6. We have applied this technique to individual characters in an alphabet.
It could also be profitably applied to larger units  e.g. we might
choose to have a single code for frequently occurring words (such as
"the") or sequences of letters within words (such as "th" or "ing").
7. The Huffman algorithm exemplifies the greedy algorithm strategy,
because at each step we choose the two lowest weight subtrees to
combine into a new subtree, thus increasing the code length for
each of the characters in the subtrees by one. We keep our cost
down by increasing the code length of the lowest frequency subtrees.
D. A significant limitation of the greedy strategy is that, for some problems, a
greedy algorithm fails to deliver a globally optimal solution.
1. For the examples we have looked at this far (shortest path,
minimum cost spanning tree, shortest job first scheduling, and
the Huffman algorithm), the greedy algorithm actually produces a
result that can be shown to be globally optimal  i.e. it finds the
best possible solution.
2. For other problems, however, finding the globally optimal solution
may require a step that is not locally optimal. A simple example of
this is finding one's way through a maze.
a. A greedy algorithm for finding one's way through a maze is
as follows: never go back to a square you've already visited
unless you have no other choice; where two or nonbacktracking
moves are possible, choose the one that moves you closer to
the goal.
b. An example where this greedy algorithm finds the best path:
(S = start, G = goal)
++
 
 +++ 
 ///  
 ++    
 S  G   
++ 
/////////////// 
++++
c. An example where this greedy algorithm fails to find the best
path, because a move away from the goal (not locally optimal)
is needed to find the best (globally optimal) path.
++
 
 ++++ 
   
+   
 S  G   
 +   
  
+++
E. As it turns out, it is frequently the case that a problem for which a
greedy algorithm fails to find the best solution may be one for which
finding the best solution inherently requires exponential effort. In
such cases, a greedy algorithm may still be be a useful approach to
finding a solution that is generally close enough  given that an
algorithm for finding the optimal solution may not be practical (e.g. it
may be NPcomplete) or an algorithmic solution may not exist at all.
1. A good example of such a problem is the bin packing problem.
a. The problem originates in the way the post office handles
packages:
i. The post office uses large cloth bins which are filled with
packages and then loaded on a plane or a truck. (Perhaps
you've seen one at a PO.)
'
ii. The problem is this: given a supply of bins of some fixed
capacity, and packages of varying sizes, find a way to put the
packages in the bins in such a way as to use the fewest
possible bins.
iii. To simplify our discussion, we will simplify the problem in two
ways:
 We will assume that the size of each package can be
represented by a single number (i.e. we will not consider
issues of shape  only overall volume).
 We will normalize the sizes to the capacity of the bin, so
that a bin will be considered to have a capacity of 1, and
the size of each package wil be represented as some fraction
of the bin capacity (e.g. 0.3). We will assume that the bin
can hold any number of packages for which (sum of size) <= 1.
iv. Although we couch the problem in terms of packing bins with
packages, similar problems arise in other areas  e.g. allocating
memory using operator new (which satisfies requests by carving
off smaller pieces from large blocks allocated by the operating
system, or allocating space for files on disk, when holes are
created by the deletion of other files.
b. The problem actually comes in two versions: the online version and
the offline version.
i. In the online version, a decision about where to place each
package must be made before the next package is seen. This
would correspond to a situation like the following:
Wall with small window in it

O 

+ _
/ \ 
 Customers hand packages
Clerk and bins  to clerk one at a time
The clerk must place each package in a bin as it is handed
through the window, before getting to see the next package.
ii. In the offline version, it is possible to look at the entire list of
packages before making a decision about where to place each one.
c. It is easy to show that there cannot be an algorithm that always
finds the optimal packing for the online version of the problem.
Suppose such an algorithm exists, and is asked to pack a total of
four packages, using the minimum possible number of bins.
Suppose the first two packages have sizes 0.45 and 0.45. Into
which bin should the algorithm place the second package?
It turns out that the answer depends on the size of the next two
packages, which the online version is not allowed to know until a
decision has been made about the second package.
i. If the next two packages are size 0.55 and 0.55, then the
optimal choice would be to place the second package in an
empty bin. This would yield a final packing using just two bins
Bin 1: First package (0.45) + Third package (0.55)
Bin 2: Second package (0.45) + Fourth package (0.55)
However if the second package is placed in the same bin as
the first, the final packing would require three bins:
Bin 1: First package (0.45) + Second package (0.45)
Bin 2: Third package (0.55)
Bin 3: Fourth package (0.55)
ii. If the next two packages are size 0.60 and 0.60, then the
optimal choice would be to place the second package in the
same bin as the first. This would yield a final packing using
three bins:
Bin 1: First package (0.45) + Second package (0.45)
Bin 2: Third package (0.60)
Bin 3: Fourth package (0.60)
However, if the second package is placed in an empty bin, the
final packing would require four bins:
Bin 1: First package (0.45)
Bin 2: Second package (0.45)
Bin 3: Third package (0.60)
Bin4 : Fourth package (0.60)
Since either choice made by the algorithm for the second package
could turn out to be wrong in some case, there cannot be an
algorithm that always makes the right choice.
d. For the offline version of the bin packing problem, it is possible
to find an optimal packing. (Consider all possibilities and pick
the best, which takes time exponential in the number of packages.)
It turns out that offline binpacking has been proved to be
NPcomplete. Thus, if the commonly held view of the relationship
between P and NP is true, then ANY offline algorithm that always
discovers an optimal solution to the binpacking problem must
take exponential time.
e. Since there is not a practical algorithmic solution to either form
of the binpacking problem, it is worth considering whether a greedy
algorithm might yield a solution that is close enough to optimal.
2. We consider first the online version of the problem
a. There are three greedy strategies we might consider.
i. One greedy strategy, called NEXTFIT, goes like this: if the
package we are dealing with would fit in the same bin as the
previously packed package, then put it there  else start a new
bin.
(Note that, once we start packing a new bin, we never go back
and put any packages in previous bins. This might be
advantageous in some applications, because once a bin is
declared packed, it can be moved out the door and loaded on the
truck or whatever.)
ii. A second greedy strategy, called FIRSTFIT, goes like this: as
we pack each package, look at each of the bins in turn, and place
it in the first bin we find where it fits. Start a new bin only
if we cannot fit the package in any of the others.
iii. A third greedy strategy, called BESTFIT, goes like this: as we
pack each package, look at each of the bins in turn, and place it
it the bin where it fits best  i.e. leaves the least unused
space. Start a new bin only if we cannot fit the package in any
of the others.
b. To see the difference between these strategies, suppose we are
trying to pack a package of size 0.2 under the following scenario
(where the last package packed was placed in bin 3)
Bin 1: Currently contains 0.7
Bin 2: Currently contains 0.8
Bin 3: Currently contains 0.3
Next fit would put the package in bin 3
First fit would put the package in bin 1
Best fit would put the package in bin 2
c. Which strategy is best?
i. Next fit will never yield an overall result that is better than
first fit or best fit. However, it is the simplest to implement,
and is the fastest running. (Each choice is theta(1), since
only the most recently used bin has to be examined, as opposed to
theta(n) for the other two.) Also, once next fit declares a bin
full, it can never be considered again, whereas with the other
two algorithms no bins can be "shipped" until all the packages
have been placed.
ii. It turns out that there are sets of data for which first fit
gives the optimal result, and the others don't; and there are
other sets of data where best fit gives the optimal result, and
the others don't.
Example: sequence of sizes 0.3 0.8 0.1 0.6 0.2
NF: Bin 1: 0.3
Bin 2: 0.8 0.1
Bin 3: 0.6 0.2
FF: Bin 1: 0.3 0.1 0.6
Bin 2: 0.8 0.2
BF: Bin 1: 0.3 0.6
Bin 2: 0.8 0.1
Bin 3: 0.2
Example: sequence of sizes 0.3 0.8 0.2 0.7
NF: Bin 1: 0.3
Bin 2: 0.8 0.2
Bin 3: 0.7
FF: Bin 1: 0.3 0.2
Bin 2: 0.8
Bin 3: 0.7
BF: Bin 1: 0.3 0.7
Bin 2: 0.8 0.2
d. It is possible to analyze the behavior of each of these strategies, and
to show that:
i. Next fit is guaranteed to find a result that requires no
more than twice the optimal number of bins (and there is some
data that will force it to use very close to this number.)
ii. First fit is guaranteed to find a result that requires no
more than 17/10 times the optimal number of bins (and again there
is some data that will force it to use very close to this
number.)
iii. Best fit is also guaranteed to find a result that requires no
more than 17/10 times the optimal number of bins (and again there
is some data that will force it to use very close to this
number.)
3. For the offline version of the problem, a greedy algorithm is still
of interest, even though it cannot guarantee optimal results, since
the problem is NPcomplete.
a. The offline versions of the greedy algorithms are derived from the
online versions based on the observation that we will generally
get better results by packing the bigger items first, and then
fitting the smaller items into the remaining spaces.
b. An offline version of the first fit algorithm is called FIRST FIT
DECREASING. It considers packages in decreasing order of size,
beginning with the largest. Each is placed using firstfit.
Example: earlier we showed that the sequence 0.3 0.8 0.2 0.7
requires three bins if packed using an online first fit
algorithm. If we use first fit decreasing offline, we
consider the packages in the order 0.8, 0.7, 0.3, 0.2, and
pack them as follows:
Bin 1: 0.8 0.2
Bin 2: 0.7 0.3
It is possible to prove that if M is the optimal number of bins needed to
pack some list of items, then first fit decreasing never uses more than
11/9 M + 4 bins to pack the same items.
c. It is also possible to derive offline versions of next fit and
best fit, which we won't discuss.
IV. DivideAndConquer Algorithms
  
A. An algorithmdesign strategy behind several of the algorithms we have seen
is divide and conquer.
1. The basic strategy is this:
partition the initial problem into two or more smaller subproblems
solve each subproblem (recursively)
stitch the solutions to the subproblems together to yield a solution to
the original problem
2. Examples we have seen?
ASK
a. One of the solutions to the maximal vector subsequence sum problem
we discuss when we introduced algorithm analysis
b. Fibonacci Numbers
c. Towers of Hanoi
d. Traversal of a binary tree
e. Quick Sort
f. Merge Sort
B. Divide and conquer is often a useful strategy for finding good algorithms.
Let's look at another example:
1. As you know, standard integer representations are limited by the
number of bits used to represent an integer (64 on modern machines).
What happens if we need to represent integers larger than this?
a. The typical solution is to use an array of int (32bit integers),
treated as digits base 2^32.
Example: a 100 decimal digit integer a might be represented by an
array of 10 32bit binary integers as
288 256 224 192 160 128 96 64 32 0
a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2
9 8 7 6 6 4 3 2 1 0
In general, we can measure the size of such a representation by
the size of the array  e.g. we would consider the size of the
above example to be 10.
b. Now suppose we had two large integers (a and b) each represented
using an array of n 32bit integers Let's consider the complexity of
various arithmetic operations.
i. Addition: We will require n additions  i.e. sum = a + b;
0 0 0
sum = a + b + carry from sum, etc.
1 1 1 0
 so the operation is theta(n)
ii. Subtraction is similar, and is also theta(n).
iii. However, for multiplication, it looks like we will require theta(n^2)
multiplications, since
288 256 224 288 256 224
(a * 2 + a * 2 + a * 2 + ... ) * (b * 2 + a * 2 + a * 2 + ...) =
9 8 7 9 8 7
576 544 512
a * b * 2 + (a * b + a * b ) * 2 + (a * b + a * b + a * b ) * 2 + ...
9 9 9 8 8 9 9 7 8 8 7 9
 so each of the n coefficients in a are multiplied by each of the n
coefficients in b.
2. We could consider a divide and conquer approach
a. divide the arrays representing each number in half (which we call
A , A and B , B below). Then the product becomes
1 0 1 0
16n 16n
(A 2 + A )(B 2 + B ) =
1 0 1 0
32n 16n 0
A B * 2 + (A B + A B) * 2 + (A B ) * 2
1 1 1 0 0 1 0 0
b. Then we can continue by calculating each of the A's and B's by
dividing the arrays in two until we get to arrays having a
single element, at which point ordinary multiplication works.
c. However, this hasn't reduced the total effort = each of the products
after the first division only requires n^2/4 multiplications, but
since there are 4 of them the overall computation is still theta(n^2).
d. At this point, though, we could take advantage of an observation
first made by Gauss in a different context. Observe that
A B + A B = (A + A )(B + B)  A B + A B
1 0 0 1 1 0 1 0 1 1 0 0
Since we need to calculate A B and A B anyway, we can use this to
1 1 0 0
replace the original four products by three products and a subtraction.
e. That means, at each stage in the divide and conquer, we only need
to create 3 subproblems with 1/4 the effort, rather than 4. And
that benefit compounds itself at each stage. (We will look
at the effect quantitatively in a bit)
C. An algorithmic pattern that is very similar to divide and conquer is
decrease and conquer.
1. In this pattern, we partition a problem into some number of
subproblems, but then discard all but one of these subproblems and
solve the original problem by solving this one.
(Note that the term "divide and conquer" is usually not used for
algorithms that discard all but one of the subproblems and then solve
the original problem by solving it.)
2. It turns out that many search strategies are actually examples of this
pattern.
a. Example: binary search of an ordered array  we compare the search
target to the middle key of the array. Based on the outcome of
this comparison, we continue our search in either the first or
last half of the array, ignoring the other half.
b. Example: search in any sort of mway search tree (binary, 234,
or BTree)  we compare the search target to the keys stored in
a node, and then continue our search in one of its children,
ignoring the others.
3. Moreover, maintenance of an mway search tree is a form of decrease
and conquer.
a. For example, when we insert into a binary search tree, at each level
we use comparison of the key we are inserting with the key at the
current node to decide whether to insert into its left or right
subtree.
b. Deletion is similar.
4. Let's look at another example. Suppose we have an unordered list of n
numbers, and want to find the kth smallest member.
a. If we wanted the smallest (or the nth smallest  which would be
the largest), there is a straightforward theta(n) algorithm.
b. For arbitrary k, it would be possible to sort the list and then take
element in position k of the result. However, this would require
theta(n log n) time because of the sort.
c. Can we do this for arbitrary k in just theta(n) time? It turns out
the answer is yes.
i. Choose an partitioning element (perhaps at random or using
some arbitrary scheme such as first element). Partition
the original list into two sublists, one containing all the
elements less than or equal to the partitioning element and one
containing all the elements greater. While doing this, keep
track of the count of elements (c) in the list containing the
smaller elements.
ii. Now, if c >= k, it means that the element we want is also
the kth smallest element in the first sublist. If c > k + 1,
the element we want is the (k  c  1) smallest element in
the second sublist. (Of course if c = k + 1 the partitioning
element is the one we want, but this would be rare).
iii. What is the complexity of this process? Since, on the average,
partitioning with a random pivot like this produces sublists
of roughly equal length, the first partioning would require
looking at all elements, but the second would look at only n/2,
the third only n/4 ...
iv. Therefore, the total number of elements examined is
n + n / 2 + n / 4 + ... + 1 = 2n. So we now have an theta(n)
algorithm!
D. Analysis of Divide and Conquer Algorithms
1. Recursive algorithms of the sort that arise in connection with divide
(or decrease) and conquery can be hard to analyze. In the case of these
algorithms, there is a general approach that works for many (but
not all) divide/decrease and conquer algorithms.
a. Let T(n) = the time it takes the algorithm to solve a problem of
size n. (Assume T(n) = O(1) for sufficiently small n.)
b. Assume that, for the recursive case, the algorithm solves a problem
of size n by partioning it into a subproblems of size n/b, where a
and b are integer constants.
E.g. for the average case of Quick Sort a is 2 and b is 2  we
partition a problem of size n into two subproblems of size n / 2.
the same is true for Merge Sort
c. Suppose, further, that the time for partitioning a problem of size n
into subproblems is f(n), and the time for stitching the solutions
together after the subproblems have been solved is g(n).
E.g. for Quick Sort, f(n) is O(n) and g(n) is O(1). For
Merge Sort, f(n) is O(1) and g(n) is O(n).
d. Then the time to solve a problem of size n is given by the recurrence
T(n) = time to partition + time to solve subproblems + time to stitch
= f(n) + aT(n/b) + g(n).
= aT(n/b) + (f(n) + g(n))
e. There is a general rule for solving recurrences of this form (which we
state here without proof)
If a recurrence is of the form
k
T(n) = aT(ceiling(n/b)) + theta(n )  where and b and k are constants,
with a > 0, b > 1, and k >= 0
Then
(log a)
b k
T(N) = theta(N ) if a > b
k k
theta(N log N) if a = b
k k
theta(N ) if a < b
f. This formula is known as the "master theorem"
2. Examples of applying this:
a. Traversal of a binary tree:
 We visit the root (which we'll assume is O(1)), and traverse
each of the subtrees in some order
 On the average, each subtree has almost N/2 nodes
Recurrence is T(N) = O(1) + 2 T(N/2), so a = 2, b = 2, k = 0
First case applies: T(N) = theta(N)  which is, of course, what we would
expect since we visit each node exactly once
b. Multiplication of big integers as discussed above  here's a case where the
formula really helps
 At each step, We split into two sublists of length N / 2 and perform
three multiplications. Splitting takes O(1) time but stitching the
result together requires O(N) additions to handle carry, so
Recurrence is T(N) = O(1) + 3 T(N/2), so a = 3, b = 2, k = 1
log 3
2 1.58
First case applies: T(N) = theta(N ) = theta(N )  a significant
improvement of the theta(n^2) algorithm we considered at first.
c. Merge Sort:
 We split into two sublists of length N/2, which takes O(1) time,
sort them, then merge them together (which takes O(N) time)
Recurrence is T(N) = 2 T(N/2) + O(N), so a = 2, b = 2, k = 1
Second case applies: T(N) = theta(N log N)
(Quick sort is similar, excpet the split is O(N) and the stitch is
O(1), but the recurrence equation and hence the solution is the same.)
d. Binary search
At each step, we create two subproblems, but only need to solve one.
Since both splitting and stitching are O(1), we get the recurrence:
T(N) = T(N/2)+O(1), so a = 1, b = 2, and k = 0
Second case applies: T(n) = theta(log N)
e. kselection.
At each step, we create two subproblems in O(N) time, but only need
to solve one, so recurrence is T(N) = T(N/2) + O(N), so a = 1,
b = 2, and k = 1.
Third case applies: T(n) = theta(N)
3. Note that the master theorem does not apply to all divide and conquer
algorithms, because it requires a, b, and k to be constants.
For example, it does not apply to the recursive computation of the Fibonacci
numbers using the definition Fib(n) = Fib(n1) + Fib(n2)[ with base cases
n = 1 and n = 2]
a. The recurrence is
T(n) = T(n1) + T(n2)
(Note that, by inspection, T(n) is O(Fib(n)))
b. Here, if we wished to attempt to apply the master theorem, we could
argue that a = 2 and k = 0 (the partition/stitch time is constant.)
However, b is n / (n  1), which while always greater than 1 becomes
increasingly close to 1 as n increases, so the master theorem does not
apply.
c. In fact, the recursive divide and conquer algorithm to calculate Fibonacci
numbers is impractical for n of any significant size, so the analysis is
not useful in any case. Fortunately, there is a linear time algorithm,
as we shall see when we talk about dynamic programming!
V. Dynamic Programming
  
A. In the last section, we were reminded that sometimes a recursive divide
and conquer algorithm can have very poor performance.
1. A good example of this is Fibonacci numbers. To see why, consider the
tree generated by the computation of Fib(6):
Fib(6)
/ \
Fib(5) Fib(4)
/ \ / \
Fib(4) Fib(3) Fib(3) Fib(2)
/ \ / \ / \
Fib(3) Fib(2) Fib(2) Fib(1) Fib(2) Fib(1)
/ \
Fib(2) Fib(1)
Observe that we do certain computations many times  e.g. we compute
Fib(5) once
Fib(4) twice
Fib(3) thrice
Fib(2) 5 times
Fib(1) 3 times
2. A much more efficient approach is to save previously computed results
and reuse them when needed, instead of repeating the computation.
This would yield the following tree for Fib(6), which would require
linear time. (Cases marked with an asterisk reuse previously computed
results instead of redoing them  note that each Fibonacci number
value from 1 to 6 is computed just once.)
Fib(6)
/ \
Fib(5) Fib(4) *
/ \
Fib(4) Fib(3) *
/ \
Fib(3) Fib(2) *
/ \
Fib(2) Fib(1)
3. The following linear time algorithm incorporates this insight:
int fibAux(int n, int saved [])
{
if (saved[n1] == 1)
saved[n1] = fibAux(n1, saved) + fibAux(n2, saved);
return saved[n1];
}
int fib(int n)
{
// Use an array to save previously computed values. An
// initial value of 1 indicates we have not yet computed
// the value and so need to do so.
int saved[n];
for (int i = 0; i < n; i ++)
saved[i] = 1;
saved[0] = saved[1] = 1; // By definition
return fibAux(n, saved);
}
4. A simpler algorithm that builds up the solution from small values
is the following.
int fib(int n)
{
if (n <= 2)
return 1;
int last = 1;
int nextToLast = 1;
int current = 1;
for (int i = 3; i <= n; i ++)
{
current = nextToLast + last;
nextToLast = last;
last = current;
}
return current;
}
B. The strategy we just used to improve the calculation of the Fibonacci
numbers is an illustration of a general algorithm design technique
called Dynamic Programming.
In Dynamic Programming, we use a table of previously calculated results
to assist us in deriving new results, rather than calculating them
from scratch.
C. An example developed in the book: Longest Common Subsequence (LCS).
1. Recall the following from the book discussion:
a. A subsequence of sequence is a sequence of elements that occur in the same
order somewhere in the sequence  not necessarily without gaps between
elements.
Example: For the string ABC, the subsequences are
, A, B, C, AB, AC, BC, and ABC
b. A common subsequence of two sequences is a subsequence of both sequences
Example: For the strings ABC and DADCD, the common subsequences are
, A, C, and AC  since they are also subsequences of DADCD, while
the subsequences of ABC that contain B are not subsequences of DADCD
c. The longest common subsequence (LCS) is the subsequence that is the longest
i. In the example we have been using, the LCS is AC
ii. It may be for some pairs of strings that the LCS is of length 0  e.g.
the LCS of ABC and DEF is
iii. It may be that the LCS of two strings is not unique  i.e. there may be
two or more different subsequences that both have the same maximal length.
Example: both AB and AC are LCSs of ABC and ACB
d. As the text notes, LCS if useful in genetics for comparing DNA strings
(sequences of the bases A, C, G, and T) and in other areas as well.
2. A brute force algorithm would compute all the subsequences of the shorter
string and then test each to see if it is a subsequence of the longer  an
approach that is more than exponential in the length of the shorter string,
and hence usually not practical.
3. The book discusses how dynamic programming might be used to develop an algorithm
whose complexity is proportional to the product of the lengths of the two
strings  i.e. theta(n^2) if the two strings have the same length. The basic
idea is to make use of a table with rows corresponding to the characters of one
string, and columns corresponding to characters of the second string (and with
an extra row and column at the start). The entries in the table represent the
length of the LCS ending with that position in each of the two strings, with
the bottom rightmost entry representing the length of the overall LCS.
For the example in the book: LCS of the DNA sequences GTTCCTAATA and
CGATAATTGAGA the initial table would look like this (dummy rows and columns
filled in with 0)
PROJECT
C G A T A A T T G A G A
1 0 1 2 3 4 5 6 7 8 9 10 11
1 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0
T 1 0
T 2 0
C 3 0
C 4 0
T 5 0
A 6 0
A 7 0
T 8 0
A 9 9
4. The table is filled in row by row from top to bottom.
a. If an entry corresponds to a place where the two strings agree, the value
is 1 more than the entry diagonally above it.
b. If an entry corresponds to a place where the two strings disagree, the
value is the maximum of the value just to the left and just above it
c. Example  the entry in row 0, column 0 (G, C) is filled in with 0.
d. Example  the entry in row 0, column 1 (G, G) is filled in with 1.
e. The last entry to be filled in  the bottom right one  represents the
length of the LCS.
PROJECT: Figure 12.2  page 563
5. The table gives the _length_ of the LCS. To get the LCS itself, one works
backwards from the bottom right corner, finding entries in the LCS from last
to first.
a. If an entry corresponds to a place where the two strings agree, the
character in question is part of the LCS, and one moves diagonally up and
to the left.
b. If an entry corresponds to a place where the two strings don't agree, one
moves either left or up, choosing the bigger of the two or choosing one
direction arbitrarily if the two are the same. (Of course, in this case, a
character is not included in the LCS).
PROJECT same figure  note trace of finding CTAATA
c. Sometimes, a pair of sequences will have two or more LCSs of the same
length. This will be reflected in a situation in the table where the choice
of moving up or left is arbitrary because of a tie.
Example: the example in the book actually has two LCSs of length 6. The
second can be found by making the choice to go up rather than left in row
8, column 10.
PROJECT  same figure, but showing trace of finding GTTTAA
ASK  are there others? (Yes  first derivation, but go up rather than left
at row 4 column 2, yielding GTAATA
D. Another Example: WeightBalanced Binary Search Trees
1. Earlier, we talked about strategies for maintaining heightbalanced
binary search trees. Where the set of keys to be stored in a tree
is fixed, and we know the relative probabilities of accessing the
different keys, it is possible to build a WEIGHTBALANCED tree in
which the average cost of tree accesses is minimized.
a. For example, suppose we had to build a binary search tree consisting
of the following C/C++ reserved words. Suppose further that we had
data available to us as to the relative frequency of usage of each
(expressed as a percentage of all uses of words in the group), as
shown:
break 55% Note: The numbers are contrived to make a point!
case 25% In no way do they represent actual frequencies
for 11% for typical C/C++ code!
if 5%
int 2%
switch 1%
while 1%
b. Suppose we constructed a heightbalanced tree, as shown:
if
/ \
case switch
/ \ / \
break for int while
 5% of the lookups would access just 1 node (if)
 25% + 1% = 26% would access 2 nodes (case, switch)
 55% + 11% + 2% + 1% = 69% would access 3 nodes (the rest)
Therefore, the average number of accesses would be
(.05 * 1) + (.26 * 2) + (.69 * 3) = 2.64 nodes accessed per lookup
c. Now suppose, instead, we constructed the following search tree
break
\
case
\
for
\
if
\
int
\
switch
\
while
The average number of nodes visited by lookup is now
 55% access 1 node (break)
 25% access 2 nodes (case)
 11% access 3 nodes (for)
 5% access 4 nodes (if)
 2% access 5 nodes (int)
 1% access 6 nodes (switch)
 1% access 7 nodes (while)
(.55 * 1) + (.25 * 2) + (.11 * 3) + (.05 * 4) + (.02 * 5) +
(.01 * 6) + (.01 * 7) = average 1.81 nodes accessed
This represents over a 30% savings in average lookup time
d. Interestingly, for the particular distribution of probability
values we have used, this tree is actually optimal. To see
that, consider what would happen if we rotated the tree about
one of the nodes  e.g. around the root:
case
/ \
break for
\
if
\
int
\
switch
\
while
We have now reduced the number of nodes accessed for lookups in
every case, save 1. But since break is accessed 55% of the
the time, the net change in average number of accesses is
(.55 * +1) + ((1  .55) *  1) = .55  .45 = +.10. Thus, this
change makes the performance worse. The same phenomenon would
arise with other potential improvements.
e. In general, weight balancing is an appropriate optimization only
for static trees  i.e. trees in which the only operations performed
after initial construction are lookups (no inserts, deletes.) Such
search trees are common, though, since programming languages,
command interpreters and the like have lists of reserved or
predefined words that need to be searched regularly. Of course,
weight balancing also requires advance knowledge of probability
distributions for the accesses. (For a compiler for a given
programming language, this might be discovered by analyzing
frequency of reserved word usage in a sample of "typical" programs.)
2. We could consider a greedy approach to discovering the optimal
binary search tree.
a. The basic idea would be to make the key of highest probability
the root of the tree. The keys of next highest probability would
be its children, etc.  subject to the constraints of the tree
being a binary search tree (e.g. only a key smaller than the
root of the overall tree could be the root of the left subtree.)
b. Applying this approach to the example we just considered would
yield an optimal tree.
c. However, the greedy strategy will not always find the optimal tree.
d. However, unlike previous cases where the greedy strategy fails to find the
optimal tree, finding the optimal tree does not require exponential time.
3. We now consider a method for finding the optimal binary search tree for a
given static set of keys, given an advance knowledge of the probabilities of
various values being sought, which finds the optimal tree in theta(n^2) time.
a. The basic idea
i. For an optimal tree containing n keys, if key k is the root, then
the two subtrees are optimal trees made up of the first k1 keys and
the last (nk)1 keys.
ii. We build up a table in with rows describing optimal trees with 1 key,
2 keys ... n keys, and columns corresponding possible starting positions
of the subtree (e.g. the first column corresponds to subtrees that
start with the first key).
(a) If there are n keys, there will be n rows, with the last describing
the optimal tree that contains all n keys  which is what we want.
(b) While the first row has n columns, the second row (describing
subtrees containing two keys) has only n1 columns, since a subtree
that contains two keys cannot start with the last key. This pattern
continues until the last row has only one column, since the subtree
it describes must start with the first key.
b. Filling in the first row of the table is trivial, since there is only
one possibility in each case for a tree containing only one key.
c. We then fill in the rest of the table rowbyrow, using information
from the previous row.
Example: when filling in the entry for an optimal tree containing the
first four keys, we consider four possibilities:
key 1 key 2 key 3 keys 4
/ \ / \ / \ / \
empty optimal optimal optimal optima1 optimal optimal empty
subtree subtree subtree subtree subtree subtree subtree subtree
containing containing containing containing containing containing containing
keys 24 key 1 keys 34 keys 12 key 4 keys 13
Since the costs of the different subtrees have already been calculated,
we choose the least expensive root, and continue working across, then
d. Of course, we must also allow for the possibility of unsuccessful search.
To handle this, we convert our search tree into an EXTENDED TREE
by adding FAILURE NODES (by convention, drawn as square boxes.)
Example: our a balanced tree for the seven C++ keywords:
if
/ \
case switch
/ \ / \
break for int while
/ \ / \ / \ / \
[] [] [] [] [] [] [] []
Each failure node represents a group of keys for which the search would fail
 e.g. the leftmost one represents all keys less than break [e.g. a, apple,
boolean]; the second all keys between break and case [c, class] etc.
To discover the optimal tree, we need to consider both the probabilities of
the keys and the probabilities of the various failure nodes  i.e. the
probability that we will be searching for something that is not in the tree
and will end up at that node.
4. To find an optimal tree, we need to define some terms and measures:
a. We will number the keys 1 .. n
b. Probabilities connected with the various keys
i. Let p be the probability of searching for key (1 <= i <= n)
i i
ii. Let q be the probability of searching for a nonexistent key
i
lying between key and key . (Of course q represents
i i+1 0
all values less than key , and q all values greater than key .)
1 n n
iii. Clearly, since we are working with probabilities, the sum of
all the p's and q's must be 1.
c. T is the optimal binary search tree containing key through key .
ij i+1 j
d. T , then, is an empty tree, consisting only of the failure node
ii
lying between key and key .
i i+1
e. We will denote the weight of T by w . Clearly,
ij ij
the weight of T is p + p + ... + p + q + q + ... q
ij i+1 i+2 j i i+1 j
which is the probability that a search will end up in T . The
ij
weight of the empty tree T , then, is q  the probability of
ii i
the failure node lying between key and key . Note that, for a
i i+1
nonempty tree, the weight is simply the probability of the root
plus the sum of the weights of the subtrees.
f. We will denote the cost of T  i.e. the average number of comparisons
ij
needed by a search that ends in T by c .
ij ij
c is calculated as follows:
ij
 If T is empty (consists only of a failure node), then its
ij
cost is zero  i.e. once we get to it, we need do no further comparisons.
 Otherwise, its cost is the weight of its root, plus the sum
of the weights of its subtrees, plus the sum of the costs of
its subtrees.
 The first term represents the fact that search for the key
at the root costs one comparison.
 The rationale for including the costs of the subtrees in the
overall cost should be clear. To this, we add the WEIGHTS
of the subtrees to reflect the fact that we must do one
comparison at the root BEFORE deciding which subtree to go into,
and the probability that that comparison will lead
into the subtree is equal to the weight of the subtree.
g. Clearly, an optimal binary search tree is one whose cost is minimal.
h. We will denote the root of T by r .
ij ij
i. Example  the balanced tree we considered earlier would be optimal if the
probabilities of all keys and failures were equal (i.e. each p and q = 1/15)
if
/ \
case switch
/ \ / \
break for int while
/ \ / \ / \ / \
[] [] [] [] [] [] [] []
i. Cost of external nodes = 0 in each case, and weights of
external nodes = 1/15 in each case. So
c = c = c = c = c = c = c = c = 0.
00 11 22 33 44 55 66 77
w = w = w = w = w = w = w = w = 1/15.
00 11 22 33 44 55 66 77
ii. Cost of each tree rooted at a level 3 node (break, for, int, while) =
weight of root (1/15) + sum of costs of subtrees (0) + sum of weights of
subtrees (2/15) = 3/15. The weight of each subtree is also 3/15. So
c = c = c = c = 3/15
01 23 45 67
w = w = w = w = 3/15
01 23 45 67
iii. Cost of each tree rooted at a level 2 node (case, switch) is 1/15 (weight
of root) plus 2 x 3/15 (costs of two subtrees) + 2 x 3/15 (weights of
two subtrees) = 13/15, and weight is 1/15 + 2 x 3/15 = 7/15. So
c = c = 13/15
03 47
w = w = 7/15
03 47
iv. Cost of overall tree (c ) =
07
Probability of root (4) = p = 1/15 +
4
Weight of left subtree (T ) = w = 7/15 +
03 03
Weight of right subtree (T ) = w = 7/15 +
47 47
Cost of left subtree (T ) = c = 13/15 +
03 03
Cost of right subtree (T ) = c = 13/15
47 47
So total cost is 41/15
v. Weight of overall tree (w ) =
07
Probability of root = 1/15 +
Weight of left subtree = 7/15 +
Weight of right subtree = 7/15 = 1
(as expected)
5. Dynamic programming is used in an algorithm for finding an optimal tree, given
a set of values for the p's and q's.
a. T is the OPTIMAL tree including keys i+1 .. j.
ij
Therefore, T is the optimal tree for the whole set of keys,
0n
and is what we want to find.
b. w is the WEIGHT of T .
ij ij
 For i = j, w = q .
ij i
 For i < j, w = p + w + w
ij r i r  1 r j
ij ij ij
c. c is the COST of T .
ij ij
 For i = j, c = 0.
ij
 For i < j, c = w + w + w + c + c
ij r i r  1 r j i r  1 r j
ij ij ij ij ij
= w + c + c
ij i r  1 r j
ij ij
d. r is the ROOT of T .
ij ij
 Obviously, r is undefined if i = j.
ij
(We will record the value as 0 in this case.)
 If i < j, then the subtrees of T are T and T
ij i r  1 r j
ij ij
(Clearly, if T is optimal then its subtrees must be also.)
ij
 We consider each possible value for r and then pick the one that
ij
yields the lowest value for c . Because we build the tree up by
ij
first considering trees containing 0 keys, then 1, then 2 ... we have
already calculated the w and c values we need to perform this comparison.
 It turns out that, in exploring possible values for r , we don't need
ij
to consider values less than r or greater than r , which greatly
i j1 i+1 j
reduces the effort.
6. As an example, the operation of the algorithm for four keys is looks like this,
if the probabilities are: p = (3/16, 3/16, 1/16, 1/16)
and q = (2/16, 3/16, 1/16, 1/16, 1/16).
PROJECT  For convenience the probabilities are multiplied by 16 which doesn't
affect the correct operation of the algorithm but eliminates a lot of "/16"
a. The first row represents empty trees, whose weights are simply
the appropriate "q" value, whose costs are 0, and whose roots
are undefined.
b. The second row represents trees containing just one key.
In each case, the weight is the sum of the weight of the one key
plus the weights of the two adjacent failure nodes, and the
cost is the weight of the one key (since the costs of failure
nodes are zero.) The root, of course, is the one key.
c. The third row represents the optimal choice for constructing
trees of two nodes.
i. For example, the first entry represents a tree including keys 1
and 2  ie. T . The two options would have been to let key 1
02
be the root or key 2 be the root. Calculating the costs:
 if key 1 is the root, then the cost is
p + w + w + c + c = 3 + 2 + 7 + 0 + 7 = 19
1 00 12 00 12
 if key 2 is the root, then the cost is
p + w + w + c + c = 3 + 8 + 1 + 8 + 0 = 20
2 01 22 01 22
Thus, 1 is chosen as r and the cost of 19 is recorded.
02
ii. The remaining entries in the row are calculated in the
same way. Note that the weights and costs needed to compare
root choices are always available from previous rows.
d. Subsequent rows represent optimal trees with 3 and then 4
keys. The latter is, of course, the final answer.
Note that, in each case, we consider all viable possibilities for the root
using information already recorded in the table, and then choose the
choice with the lowest cost
5. This algorithm is implemented by the following program:
PROJECT CODE
6. Time complexity? (ASK CLASS)
a. At first it may appear to be theta(n^3) [ three nested loops ]
b. The code incorporates an improvement suggested by Donald Knuth that makes
this theta(n^2) by limiting the range of possible roots considered when
searching for the optimal root by again taking advantage of previously
computed values. We won't pursue this.
VI. Randomized Algorithms
  
A. A final category of algorithm design approaches we want to consider
is randomized algorithms.
1. One variant on this approach is to use randomization to deal with
the possibility of worst case data sets.
2. A second variant arises when exhaustively testing all the data we
need to test to get a guaranteed answer is computationally infeasible.
In such a case, it may be possible to test a random sample
and get an answer that is sufficiently reliable.
B. As an example of the first category of uses of randomization, consider
quick sort.
1. We know that if we choose the first element in the unsorted data as
the pivot element, the algorithm degenerates to O(n^2) performance
in the case where the data is already sorted in either forward or
reverse order.
2. Now consider what would happen if we chose a RANDOM element as the
pivot element.
a. Obviously, it could still be the case that we happen to make
a bad choice  indeed, we could end up with a bad choice even
if the data itself is random, if we happened to choose the
smallest (or largest) element.
b. However, the probability of making a bad choice is small, and
the probability of making bad choices over and over again on
successive iterations becomes increasingly small.
c. Further, the pathological case of already sorted data now poses
no more problem than any other data set. If there is a
significant probability that we will have significant preexisting
order in our data, randomly choosing the pivot element may
greatly reduce the likelihood of pathological behavior (though it
cannot eliminate it, of course.)
C. As an example of the second category of uses of randomization, consider
testing an integer to see if it is prime.
1. This is an important problem in connection with cryptography, since
the most widely used encryption scheme generates its key from two
large prime numbers (potentially 100's of bits.)
2. To exhaustively test an integer n to see if it is prime, we would
have to try dividing it by all possible factors less than or equal to
sqrt(n). This would seem to be an O(n^1/2) operation, which is
certainly not bad. However, when dealing with cryptographic
algorithms, we tend to use the NUMBER OF BITS as the measure of
problem size. For a b bit number, the maximum value is 2^b  1, and
we need to test possible factors in the range 2 .. 2^b/2. This
means exhaustively testing an integer to see if it is prime takes
time exponential in the number of bits.
3. There are various results from number theory that allow us to test a
small, randomlychosen subset of the possible factors. If any of these
declares the number to be nonprime, it is definitely nonprime. If
the number passes all the tests, we can say with a very high
probability that it is prime. (Since I don't claim any expertise in
the relevant number theory, I leave the details to someone like Dr.
Crisman)
D. One further issue with using a randomized algorithm, of course, is how
do we get random numbers on a deterministic machine?
1. Absent very specialized hardware, the answer is that we settle for
PSEUDORANDOM SEQUENCES that behave, statistically, like random
numbers.
2. One good way to generate such a sequence is by using a linear
congruential generator, which generates each new element of the
sequence x(i+1) from the previous member of the sequence x(i) by
using the congruence:
x = A x mod M
i+1 i
for appropriately chosen values of A and M.
3. It is important to choose appropriate values of A and M, and also to
deal appropriately with the possibility of overflow in the computation.
(Multiplying two 32bit integers can yield a product as big as 64 bits).
Some widelyuse "random number" functions actually have some very bad
characteristics.
4. As a practical matter, when writing randomized algorithms on a
Unix system, use the newer random number function random() instead
of the older rand(), whose lower bits cycle through the same
pattern over and over. (On Linux systems, rand() is actually
random()  the old rand() is not used.)