CPS222 Lecture: Sorting Last Revised 3/13/2015
Objectives:
1. To introduce basic concepts (what do we mean by sorting? internal and
external sorts. Stability)
2. To introduce common internal sorting algorithms.
3. To introduce the basic external merge sort algorithm.
4. To prove that sorting by comparison is omega(n log n)
Materials:
1. Copy of Knuth volume 3 to show
2. Projectable of various internal sorting algorithms
3. Handout with above code
4. Projectable of bubble sort tree for three items that are permutations of ABC
I. Introduction
 
A. The topic of sorting is a very important one in the area of data
structures and algorithms, because many computer applications make use
of sorting in some form or fashion. As a result, this area has been
studied extensively, and numerous algorithms with varying performance
strengths have been developed.
B. The basic goal of sorting is fairly intuitive  arrange a group of
items in ascending (or descending) order. But there are a few
nuances we need to consider.
1. Sometimes, we are sorting items such as numbers or names, where
the entire entity we are sorting also serves as the basis for the
sort. At other times, we are sorting complex structures based on
one piece of information  traditionally called the SORT KEY  or
just the key for short.
a. Sometimes, the same list of items may even be sorted using different
sort keys at different times.
Example: Suppose we create a class Student with instance variables
like the following:
id (an integer)
last name (a string)
first name (a string)
major (a string)
class year (an integer)
gpa (a real)
If we have a list of Student objects, it is easy to imagine
different circumstances under which we would want to use any of
these instance variables as a sort key  or perhaps even use last
name and first name together as a COMPOSITE KEY. (Sort based on
last name, use first name to break ties.)
b. For simplicity, we will discuss algorithms for sorting a list that
is "all key"  but the same principles could be used for sorting a
list of objects where one field is the sort key.
2. It turns out that sorting algorithms are generic in the sense that
(in almost every case) a given algorithm will work with any type of
sort key that is comparable.
a. Basically, a C++ type or class is comparable if it defines an
operator <. (This includes numeric types, strings, and any
class for which the class author defines operator <).
b. Java has a similar notion with an interface called Comparable.
To implement Comparable, a class must have a method called compareTo
which, when applied to another object of the same class,
returns a negative value if it is less than the other object, 0 if
it is equal, and a positive value if it is greater than the other
object.
Note that in Java sorting algorithms are implemented slightly
differently for sorting objects of primitive type (like ints) 
where the builtin operator < is used  and for sorting objects of
class type (including String)  where the class must implement
Comparable and the compareTo() method is used in the algorithm.
c. As we develop our algorithms, we won't worry about how the sort
key is actually defined for the objects we are sorting  we only
require that the objects to which our sorting algorithms are
applied somehow define a comparison operation (<) that is
meaningful for objects of that type. We'll use < as the comparison
operator without regard to languagespecific nuances.
d. Note, too, that we will discuss algorithms in terms of sorting
into ascending order. The same algorithms can be used for
sorting into descending order, except that we reverse the order of
comparison.
3. In general, we can consider implementing a sorting operation for
almost any sort of collection: an array, a vector, a linked list,
etc. Traditionally, sort algorithms have been formulated on
the assumption that the items being sorted are in an array whose
elements are accessible by operator []. We'll use array terminology,
recognizing that the collection we are sorting is not
necessarily an array.
4. We are now prepared to define what we mean when we say that an
array is sorted: we say that an array of entities x[0 .. n1] is
sorted if x[0] <= x[1] <= x[2] ... <= x[n1]  or, more simply,
x[i] <= x[j] whenever 0 <= i <= j <= n1.
5. We can also define what we mean by sorting an array: we sort an
array by permuting in such a way as to produce a sorted array.
Note that we explicitly require that the sorted array be a permutation
of the original array. This precludes the use of a simplistic "sort"
algorithm like the following:
for (int i = 0; i < n; i ++)
x[i] = x[0];
This results in a sorted array by our definition, but we don't
consider this a proper sorting algorithm because the result is not 
in general  a permuation of what we started with!
C. We will study a variety of sorting algorithms, because there is no one
best algorithm for all cases.
1. Different algorithms work best for different SIZE arrays.
a. For all but very large arrays, we will typically use an INTERNAL
SORT, in which the items to be sorted are kept in main memory.
For very large arrays, we will have to use an EXTERNAL SORT in
which the items to be sorted reside in secondary storage (disk
or tape) and are brought into main memory to be sorted a few at
a time.
i. Internal sorts are much faster than external sorts, because the
access time of external storage devices is orders of magnitude
greater than that for main memory.
ii. However, internal sorts are limited as to the amount of data
they can handle by available main memory, while external sorts
are limited by available external storage (which is generally
orders of magnitude bigger.) (Note, too, that, with virtual
memory, main memory appears almost boundless; but if the amount
of memory in use becomes too great, then paging begins to occur,
and the performance of the internal sort begins to deteriorate.)
As memory sizes have grown, external sorting has become
unnecessary for many applications, but it still important for
algorithms that deal with big data.
iii. Often, an external sorting algorithm will make use of an internal
sort, done on a portion of the data at one time, to give it a
"head start".
(We will focus on internal sorts first, and then will talk about
external sorts.)
b. Among internal sorts, there are several algorithms with theta(n^2)
behavior, and several with theta(n log n) behavior. Interestingly,
for sufficiently small arrays, a theta(n^2) algorithm may be
faster than a theta(n log n) algorithm, due to a smaller constant
of proportionality. Moreover, when implementing a recursive "divide
and conquer" theta(n log n) algorithm, it is common to switch to
using a theta(n^2) algorithm when the pieces become sufficiently
small.
2. Some algorithms are quite sensitive to the presence of some
initial order in the items being sorted. Some algorithms do
better when this is the case; others actually do worse (they
work best on totally random data.)
3. Some algorithms require significant additional space beyond that
needed to store the actual data to be sorted; others require very
little additional space. Extra space required can range from
theta(1) to theta(n) extra space.
4. In some cases, STABILITY of the algorithm is an important
consideration.
a. The issue of stability arises if we are sorting an array where
duplicate keys are allowed  i.e. two (or more entries) may
legally have the same key. (Example: sorting a list of people
by last name.)
b. A sort is said to be STABLE if two records having the same key
value are guaranteed to be in the same relative order in the
output as they were in the input.
c. Example: Suppose we were sorting bank transactions, each
consisting of an account id, transaction code, and amount  e.g.
5437 D 100.00
1234 D 50.00
5437 W 50.00
1234 W 20.00
(where the sort key is just the account number.)
i. A stable sorting algorithm would be guaranteed to produce:
1234 D 50.00
1234 W 20.00
5437 D 100.00
5437 W 50.00
ii. While an unstable one might produce the above, or any of
the following instead:
1234 W 20.00
1234 D 50.00
5437 D 100.00
5437 W 50.00
1234 D 50.00
1234 W 20.00
5437 W 50.00
5437 D 100.00
or 1234 W 20.00
1234 D 50.00
5437 W 50.00
5437 D 100.00
Here, the stable sort might be necessary to ensure correctness
if one of the withdrawl transactions, in fact, represents a
withdrawl against the funds deposited earlier  i.e. there was
not enough money in the account to cover the withdrawl before
the deposit was made.
d. Stability is never an issue if the sort keys are guaranteed to
be unique  i.e. no two items can have the same value of the key.
D. A classic work on sorting is Donald Knuth: The Art of Computer
Programming volume 3: Sorting and Searching.
II. Approaches to Internal Sorting
    
We will begin by considering sorting algorithms that are primarily used
for internal sorts. There are a number of basic approaches to sorting,
including the following (classification from Knuth volume 3). (For
consistency, I will illustrate each with sample code that sorts an array
of strings  but the algorithm is the same regardless of what one is
sorting)
DISTRIBUTE HANDOUT
A. Sorting by insertion:
for (int i = 1; i < n; i ++)
insert the ith entry from the original array into a
sorted subtable composed of entries 0..i1
1. Demonstrate with class
2. Many texts have an algorithm for a straight insertion sort.
a. Example Code: PROJECT/HANDOUT
b. Analysis:
ASK
theta(n^2)
c. What will happen if used on already sorted data?
ASK
Time becomes theta(n), because inner loop terminates immediately
on each time through outer loop. This is a peculiar characteristic
of this algorithm which it makes it advantageous to use in cases
where there is a significant probability that the data will already
be in order.
3. The Shell sort is an insertion sort with behavior approximately
O(n^1.26)  closed form analysis being very difficult. (We won't
discuss)
4. Another variant of insertion sort is address calculation sort.
a. This builds on the idea that if we are manually sorting a pile
of papers, and we see a paper with a lastname beginning with
'B', we automatically start looking for its near the beginning
of the pile; if it begins with 'T', we look near the end;
and if it begins with 'M' we look near the middle.
b. One approach is to conduct insertion sort with several lists,
instead of one, each corresponding to a certain range of
key values (e.g. AC, DF ...). An item is inserted using
the methods of insertion sort into the appropriate list, and
then they are all combined at the end.
c. We won't develop further
5. Simple insertion sort is stable and address calculation sort are
stable, but Shell Sort is not.
B. Sorting by exchanging:
scan the table repeatedly (by some scheme), looking for
items that are not in the correct sequence visavis each
other, and exchange them.
1. Almost every intro computer science text discusses the bubble sort,
which is an exchange sort.
a. Example Code: PROJECT / HANDOUT
b. Analysis?
ASK
theta(n^2)  but with a larger constant of proportionality than
insertion sort, because multiple exchanges can be done on each
pass through the outer loop, while insertion sort does simple
data movements rather than exchanges.
c. What will happen if used on already sorted data?
ASK
In this case, there is no asymptotic gain (though no exchanges
are done, so the overall time is better by a constant factor.)
There are improvements to the algorithm that terminate early if
no exchanges are done on some pass, yielding potentially theta(n)
behavior on sorted data.
d. The chief reason for this sort being so widely known is that
the code is so simple.
2. Quicksort
a. The basic idea is this:
i. Choose an arbitrary element of the list as the pivot element.
ii. Rearrange the list as follows:
keys <= pivot pivot keys >= pivot
(Note that a key that is equal to the pivot can end up in
either half)
iii. Sort the two partitions recursively
b. CODE  PROJECT Goodrich/Tamassia Code Fragment 11.5
i. This version makes the arbitrary choice of using the last
element as the pivot.
ii. Note that we consider this sort to be an exchange sort because
of the method used to do the partitioning.
c. Analysis: We consider average case and worst case separately:
i. Average case  we expect each partition to divide the list
roughly in half. We can thus picture the partitioning process
by a tree like the following:
n items
n/2 n/2
n/4 n/4 n/4 n/4
....................................
1 1 1 ......................................... 1 1 1
 At each "level", we must examine all n items to create
the next level of partitions. There are log n levels 
therefore QuickSort is O(nlogn), average case.
ii. In the worst case, QuickSort is not so good, however.
Consider the behavior for a list that is exactly backward.
 The first partion produces sublists of 0 and n1 items.
 The second produces sublists of 0 and n2
...
 Therefore, there are n levels of partitioning, each
examining theta(n) items  therefore the worst case for
QuickSort is theta(n^2).
What about the case where the list is already sorted to
begin with? Paradoxically, this too turns out to be theta(n^2).
d. We can reduce the likelihood of worst case behavior by improving the
way we select the pivot element.
i. Ideally, the key we use as the pivot should be the median
of the items in the list. In practice, this involves
either sorting the list, or using a rather complex theta(n)
algorithm which we won't discuss.
ii. One simple improvement is as follows: instead of choosing the
first item in the list as the basis for partitioning, choose the
median of the (physically) first, (physically) middle, and
(physically) last. (Worst case behavior can still occur, but not
with the case of a backward or an already ordered list.)
iii. If our major concern is with avoiding the worstcase behavior that
comes when the data is already sorted or reversesorted, we can
also select a pivot randomly from among all the items being
worked on  which may be somewhat simpler to implement.
e. In practice, QuickSort is often improved by switching to another
method (e.g. insertion sort) when the size of the sublist to
be sorted falls below some threshold. That is, the recursive
calls might be coded as follows:
Present code:
if (size <= 1)
; // Do nothing
else
{
... Quick sort code
Modified code:
if (size <= 1)
; // Do nothing
else if (size < threshold)
{
... Insertion sort code
}
else
{
... Quick sort code
f. One other point to note about quicksort is that, due to the
recursion, it does require additional memory for the stack.
i. The amount of additional memory needed will vary from O(log n)
[if each partitioning roughly divides the list in two] to
O(n) [in the pathological cases where each partitioning
produces one sublist that is smaller by just 1 item than
the list that was partitioned.]
ii. The stack growth can be kept to O(log n) in all cases as
follows: Always sort the smaller of the two sublists first,
and use tail recursion optimization on the second call in each
case
3. The bubble sort is stable, but quicksort is not.
C. Sorting by selection:
for (i = 0; i < n; i ++)
select the smallest (largest) item from those still under
consideration, put it in the right place, and remove it from
consideration on further passes
1. Demonstrate with class
2. Many texts give an algorithm for a straight selection sort.
a. Example Code: PROJECT / HANDOUT
b. Analysis:
ASK
theta(n^2). Constant of proportionality tends to be better than
insertion sort, because there is only one data movement done
per pass through the outer loop.
c. What will happen if used on already sorted data?
ASK  Nothing is gained or lost
3. Heapsort is a selection sort method
a. The text discussed heapsort in conjunction with its discussion of
heaps, though I postponed the reading of this material until now
1. We have already seen that it is possible to convert an array
to a heap enmasse in theta(n) time. Suppose we were to build a
maxheap (largest item is on top of the heap.) Clearly that item
belongs at the _end_ of a sorted version of the original array.
2. We have also seen that it is possible to remove the top item from
a heap and replace it by its appropriate successor in
theta(log n) time.
b. This leads to the following approach to sorting:
Convert the array into a maxheap
for (i = 0; i < n; i ++)
remove the top item from the heap and put it i slots from
the end of the sorted array; then readjust the heap
c. Example code: PROJECT / HANDOUT
Demonstrate phase 2 (after heap built) using student names
d. Analysis
ASK
Since the first step takes theta(n) time and the loop does a
theta(log n) operation n times, the total time will be
theta(n) + n * theta(log n) = theta(n log n)
4. Neither simple selection sort nor heapsort is stable, though simple
selection can be made stable at the cost of both extra time and space.
D. Sorting by merging
1. Suppose we have two sorted lists. It is easy to merge them
into a single sorted list in theta(n) time
for (i = 0; i < n; i ++)
choose the smaller item from the fronts of the two lists,
and add it to the sorted list. (If one list is empty,
always take from the other list)
a. Demonstration: merge two sorted lists of student names
b. This leads to a recursive sorting strategy:
 Split the data in half
 Sort each half recursively
 Merge the two sorted halves
c. Example code: PROJECT / HANDOUT
d. Analysis:
ASK
Guaranteed theta(n log n)  by similar reasoning used to show
that quick sort is theta(n log n)  but this time, we can
guarantee perfect partitioning, so this asymptotic bound holds
for all cases
e. Moreover, if we break ties by always choosing from the list thast
came from nearer the start of the original list, we can guarantee
that merge sort is stable.
f. Unfortunately, we require theta(n) extra space to store the merged
list  or we can use linked lists, which require theta(n) extra
space for the links!
2. We will see shortly that merge sorting is the basis for all external
sorting strategies  though sometimes we sacrifice stability for
extra speed.
E. Sorting by distribution:
1. This works with a key of m "digits", using d "pockets" where
d is the number of possible values a key digit may assume
(e.g. 10 for a decimal key; 26 for an alphabetic key etc.)
for (i = 0; i < m; i ++)
distribute the file into d pockets based on the ith
key digit from the right
reconstruct the file by appending the pockets to one another.
Example: Assume we are sorting strings of three letters, drawn
from the alphabet ABCDE [so we need 5 pockets]
Initial data: CBD
ADE
CAD
ADA
BAD
ACE
BEE
BED
First distribution  on rightmost character:
ADA (empty) (empty) CBD ADE
CAD ACE
BAD BEE
BED
Pickup lefttoright: ADA
CBD
CAD
BAD
BED
ADE
ACE
BEE
Second distribution:
CAD CBD ACE ADA BED
BAD ADE BEE
Pick up: CAD
BAD
CBD
ACE
ADA
ADE
BED
BEE
Third distribution:
ACE BAD CAD
ADA BED CBD
ADE BEE
Final pickup: ACE
ADA
ADE
BAD
BED
BEE
CAD
CBD
2. Time complexity appears to be order (n*m)  but note that for
n keys we have a minimum value for m of log n  therefore, it
d
is in fact theta(nlogn)  since log n and log n have a constant ratio.
2 d
3. Unfortunately, distribution sorting requires extra space; though the
extra space requirements can be kept down by careful coding.
a. If the "pockets" were represented by arrays, then we would
need one array for each possible value of a digit  e.g.
26 pockets if sorting based on letters of the alphabet. Further,
each pocket would need to be big enough to possibly hold
all the data if, in fact, all the keys had the same value in
one position. Thus, we would need O(n) extra space, where the
constant of proportionality would be huge.
b The extra space can be greatly reduced  though it is still
O(n)  by represented the "pockets" by linked lists, using
a table of links as in the previous example. This is really
the only practical way to go.
4. Distribution sorting is always stable; in fact, it relies on
the stability of later passes to preserve the work done on
earlier ones.
5. (No demo code for this one  but the book discusses briefly under
the name "bucket sort")
F. Sorting by enumeration:
For each record, determine how many records have keys less
than its key. We will call this value for the ith
record count[i].
Clearly, the record currently in position i actually belongs
in position count[i] + 1, so as a final step we put it there.
1. Observe that this strategy is theta(n^2), and is stable [because
if two records have equal keys, we increase the count of the one
occurring physically later.]
2. An interesting variant is possible if the set of possible keys is
small (i.e. many items have the same key.)
a. Example: Sort the students by academic class  using two arrays:
count[i] and position[i] (1 <= i <= 4)
i. We make one pass through all the students to calculate count[].
ii. position[1] is set to 0
iii. position[i] (2 <= i <= 4) is set to position[i1] + count[i1]
iv. Make a second pass through all the students and place according
to current value of position[] for his/her class, then increment
position.
b. Analysis:
ASK
O(n)
 but special case!
3. (No demo code for this one)
G. Summary
We can compare the internal sorting strategies we have looked at
thus far by considering several attributes;
1. Asymptotic complexity
2. Behavior with already sorted data.
3. Need for additional storage.
4. Stability
Algorithm Asymptotic Impact of Extra Stable?
Complexity Sorted Data Storage
    
Simple theta(n^2) becomes minimal yes
Insertion theta(n).
Bubble theta(n^2) can become minimal yes
theta(n)
w/suitable coding
Quicksort theta(n log n) can degenerate theta(log n) no
average  to theta(n^2) stack
can degenerate unless avoided for
to theta(n^2) by coding recursion
Simple theta(n^2) little change minimal no
Selection
Heapsort theta(n log n) little change minimal no
always
Merge Sort theta(n log n) little change theta(n) for yes
extra array
or at least
links
Distribution If keys are little change theta(n) yes
Sort unique, ends
up theta (n log n)
Enumeration theta(n^2) little change theta(n) for yes
Sort counts
 can be theta(n)
for special case where potential
key values form a small set.
The result of this analysis shows that there is no one algorithm
that's best on all counts. In particular, there is no known sorting
algorithm that has all of the following characteristics: theta(n log n)
asymptotic complexity, theta(1) extra space, and stability. We can get
any two of the three, but not all three!
III. How Fast Can We Sort?
     
A. One observation one can make from the table we just considered is that
the best generalpurpose sorting algorithms have asymptotic complexity
theta(n log n). Is this as good as we can do, or is it possible to find
an algorithm whose average case asymptotic complexity is less than
n log n?
B. In the case of sorts based on binary comparisons, the answer to our
question is no. We will now prove the following theorem:
Theorem: Any sort BASED ON BINARY COMPARISONS must have complexity at
least O(nlogn).
C. Proof:
1. Any sorting algorithm for sorting n items must be prepared to
deal with all n! possible permutations of those items, and must
deal with each permutation differently.
2. Each comparison in the sort serves to partition the permutations
into two classes  those passing the test, and those failing the
test.
Example: bubble sort of three items must deal with 6 permutations:
ABC ACB BAC BCA CAB CBA
The first comparison checks to see if item[0] is <= item[1]. Three
permutations pass this test: ABC, ACB, and BCA. The other three
(BAC, CAB, CBA) fail the test, necessitating an exchange.
3. Each subsequent comparison partitions each of these classes
further.
Example: the second comparison checks to see if item[1] <= item[2].
Of the three permutations passing the first test, one passes the
second (ABC) and the other two do not. Of the three permutations
failing the first test  and after the exchange  only one passes
the second test (BAC altered to ABC).
c
4. After c comparisons, then, we have 2 classes  some of which may
be empty.
5. At the completion of the sort, we must have at least n! classes 
since each original permutation must be handled differently.
Example: complete classification tree for bubble sort of 3 items:
item[0] <= item[1]
/ no \ yes
(BAC, CAB, CBA) (ABC, ACB, BCA)
 become 
(ABC, ACB, BCA) 
 
item[1] <= item[2] item[1] <= item[2]
/ no \ yes / no \ yes
(ACB, BCA) (ABC) (ACB, BCA) (ABC)
 become   become 
(ABC, BAC)  (ABC, BAC) 
   
item[0] <= item[1] item[0] <= item[1] item[0] <= item[1] item[0] <= item[1]
/ no \ yes / no \ yes / no \ yes / no \ yes
(BAC) (ABC) (empty) (ABC) (BAC) (ABC) (empty) (ABC)
 becomes  becomes
(ABC) (ABC)
PROJECT
After 3 comparisons, we have eight classes  6 of which contain one item
(corresponding to each of the 3! original permutations) and 2 of which are empty.
c
6. Thus, we have 2 >= n!, or c >= log(n!)
7. However, by Stirling's approximation, n! ~ sqrt(2 pi n) * (n/e)^n
so log(n!) ~ 0.5(1 + log(pi) + log(n)) + n (log(n)  log(e))
= n logn + O(n) + O(logn) + O(1)
= O(n log n)
so c >= O(n log n) = omega (n log n)  QED
8. Note: our text argues that log(n! is omega (n log n) in a different
way  same conclusion, just a different way of getting there.
IV. External Sorting
  
A. We have seen that the algorithms we use for searching tables stored on
disk are quite different from those used for searching tables stored in
main memory, because the disk access time dominates the processing time.
B. For much the same reason, we use different algorithms for sorting
information stored on disk than for sorting information in main memory.
1. We call an algorithm that sorts data contained in main memory an
INTERNAL SORTING algorithm, while one that sorts data on disk is
called an EXTERNAL SORTING algorithm.
2. In the simplest case  if all the data fits in main memory  we
can simply read the data from disk into main memory, sort it using
an internal sort, and then write it back out.
3. The more interesting case  and the one we consider here  arises
when the file to be sorted does not all fit in main memory.
4. Historically, external sorting algorithms were developed in the context
of systems that used magnetic tapes for file storage, and the
literature still uses the term "tape", even though files are most often
kept on some form of disk. It turns out, though, that the storage
medium being used doesn't really matter because the algorithms we will
consider all read/write data sequentially.
C. Most external sorting algorithms are variants of a basic algorithm
known as EXTERNAL MERGE sort. Note that there is also an internal
version of merge sort that we have considered. External merging
reads data one record at a time from each of two or more files, and
writes records to one or more output files. As was the case with
internal merging, external merging is theta(n log n) for time, but
theta(n) for extra space, and (if done carefully) it is stable.
D. First, though, we need to review some definitions:
1. A RUN is a sequence of records that are in the correct relative order.
2. A STEPDOWN normally occurs at the boundary between runs. Instead
of the key value increasing from one record to the next, it decreases.
Example: In the following file: B D E C F A G H
 we have three runs (B D E, C F, A G H)
 we have two stepdowns (E C, F A)
3. Observe that an unsorted file can have up to n runs, and up
to n1 stepdowns. In general (unless the file is exactly
backwards) there will be a lesser number than this of runs and
stepdowns, due to preexisting order in the file.
4. Observe that a sorted file consists of one run, and has no
stepdowns.
E. We begin with a variant of external merge sort that one would not use
directly, but which serves as the foundation on which all the other
variants build.
1. In the simplest merge sort algorithm, we start out by regarding
the file as composed of n runs, each of length 1. (We ignore any
runs which may already be present in the file.) On each pass, we
merge pairs of runs to produce runs of double length.
a. After pass 1, we have n/2 runs of length 2.
b. After pass 2, we have n/4 runs of length 4.
c. The total number of passes will be ceil(log n). [ Where ceil
is the ceiling function  smallest integer greater than or equal
to.] After the last pass, we have 1 run of length n, as desired.
d. Of course, unless our original file length is a power of 2, there
will be some irregularities in this pattern. In particular, we
let the last run in the file be smaller than all the rest 
possibly even of length zero.
Example: To sort a file of 6 records:
Initially: 6 runs of length 1
After pass 1: 3 runs of length 2 + 1 "dummy" run of length 0
After pass 2: 1 run of length 4 + 1 run of length 2
After pass 3: 1 run of length 6
2. We will use a total of three scratch files to accomplish the sort.
a. Initially, we distribute the input data over two files, so that
half the runs go on each. We do this alternately  i.e. first
we write a run to one file, then to the other  in order to
ensure stability.
b. After the initial distribution, each pass entails merging runs
from two of the scratch files and writing the generated runs on
the third. At the end of the pass, if we are not finished, we
redistribute the runs from the third file alternately back to the
first two.
Example: original file: B D E C F A G H
initial distribution: B E F G (File SCRATCH1)
D C A H (File SCRATCH2)
(remember we ignore runs existing in the raw data)

after first merge: BD CE AF GH (File SCRATCH3)
PASS 1
redistribution: BD AF (File SCRATCH1)
CE GH (File SCRATCH2)

after second merge: BCDE AFGH (File SCRATCH3)
PASS 2
redistribution: BCDE (File SCRATCH1)
AFGH (File SCRATCH2)

after third merge: ABCDEFGH (File SCRATCH3) PASS 3
(no redistribution)
3. Analysis of the basic merge sort
a. Space: three files, one of length n and two of length n/2. We
can use the output file as one of the scratch files, so
the total additional space is two files of length n/2
= total scratch space for n records
In addition, we need internal memory for three buffers  one
for each of the three files. In general, each buffer needs to
be big enough to hold an entire block of data (based on the
blocksize of the device), rather than a single record.
b. Time:
 Initial distribution involves n reads
 Each pass except the last involves 2n reads due to merging
followed by redistribution. The last pass involves just n reads.
 Total reads = 2 n ceil(log n), so total IO operations =
4n ceil(log n)
F. A significant improvement arises from the observation that our original
algorithm started out assuming that the input file consists of
n runs of length 1  the worst possible case (a totally backward
file.) In general, the file will contain many runs longer than one
just as a consequence of the randomness of the data, and we can use
these to reduce the number of passes.
1. Example: The sample file we have been using contains 3 runs, so
we could do our initial distribution as follows:
initial distribution: BDE AGH (File SCRATCH1)
CF (File SCRATCH2)
after first pass: BCDEF (File SCRATCH3)
AGH (File SCRATCH4)
after second pass: ABCDEFGH (File SCRATCH1)
(Note: we have assumed the use of a balanced merge; but
a nonbalanced merge could also have been used.)
2. This algorithm is called a NATURAL MERGE. The term "natural"
reflects the fact that it relies on runs naturally occurring in
the data.
3. However, this algorithm has a quirk we need to consider.
a. Since we merge one run at a time, we need to know where one run
ends and another run begins. In the case of the previous
algorithms, this was not a problem, since we knew the size of
each run. Here, though, the size will vary from run to run.
In the code we just looked at, the the solution to this problem
involved recognizing that the boundary between runs is marked by
a stepdown. Thus, each time we read a new record from an input
file, we will keep track of the last key processed from that
file; and if our newly read key is smaller than that key, then we
know that we have finished processing one run from that file.
Example: in the initial distribution above, we placed two
runs in the first scratch file. The space between
them would not be present in the file; what we
would have is actually BDEAGH. But the run boundary
would be apparent because of the stepdown from E to A.
b. However, if stability is important to us, we need to be very
careful at this point. In some cases, the stepdown between
two runs could disappear, and an unstable sort could result.
Consider the following file:
F E D C B A M1 Z N M2 (where records M1 and M2 have
identical keys.)
___ No stepdown here, so
 2 runs look like one:
v
Initial distribution: F  D  B  N
E  C  A M1 Z  M2
F  D  B N
E  C  A M1 Z  M2
First pass: E F  A B M1 N Z
C D  M2
^
___ No stepdown here, so two runs
look like one:
E F  A B M1 N Z
C D M2 
Second pass: C D E F M2
A B M1 N Z
Third pass: A B C D E F M2 M1 N Z
^

In the case of equal keys, we take
record from first scratch file before
record from second, since first
scratch file should contain records
from earlier in original file.
c. If stability is a concern, we can prevent this from occurring
by writing a special runseparator record between runs in our
scratch files. This might, for example, be a record whose
key is some impossibly big value like maxint or '~~~~~'.
Of course, processing these records takes extra overhead
that reduces the advantage gained by using the natural runs.
d. Analysis:
i. Space is the same as an ordinary merge if no run separator
records are used. However, in the worst case of a totally
backward input file, we would need n run separator records
on our initial distribution, thus potentially doubling the
scratch space needed.
ii. The time will be some fraction of the time needed by an
ordinary merge, and will depend on the average length of
the naturally occurring runs.
 If the naturally occurring runs are of average length 2, then
we save 1 pass  in effect we start where we would be on the
second pass of ordinary merge.
 In general, if the naturally occuring runs are of average
length m, we save at least floor(log m) passes. Thus, if we
use a balanced 2way merge, our time will be
n (1 + ceil(log n  log m)) reads =
n (1 + ceil(log n/m)) reads or
2n (1 + ceil(log n/m)) IO operations
 Of course, if run separator records are used, then we actually
process more than n records on each pass. This costs
additional time for
n/m reads on first pass
n/2m reads on second pass
n/4m reads on third pass
...
= (2n/m  1) additional reads,
or about 4n/m extra IO operations
 Obviously, a lot depends on the average run length in the
original data (m). It can be shown that, in totally random
data, the average run length is 2  which translates into
a savings of 1 merge pass, or 2n IO operations. However, if
we use separator records, we would need 2n extra IO operations
to process them  so we gain nothing! (We could still gain
a little bit by omitting separator records if stability were
not an issue, though.)
 In many cases, though, the raw data does contain considerable
natural order, beyond what is expected randomly. In this
case, natural merging can help us a lot.
G. Another improvement builds on the idea of the natural merge by using an
internal sort during the distribution phase to CREATE runs of some size.
1. The initial distribution pass now looks like this  assuming we
have room to sort s records at a time internally:
while not eof(infile) do
read up to s records into main memory
sort them
write them to one of the scratch files
2. Clearly, the effect of this is to reduce the merge time by a
factor of (log (n/s)) / (log n). For example, if s = sqrt(n),
we reduce the merge time by a factor of 2. The overall time is
not reduced as much, of course, because
a. The distribution pass still involves the same number of reads.
b. We must now add time for the internal sorting!
c. Nonetheless, the IO time saved makes internal run generation
almost always worthwhile.
Example: suppose we need to sort 65536 records, and have room
to internally sort 1024 at a time.
 The time for a simple merge sort is
65536 * (1 + log 65536) reads + the same number of writes
= 65536 * 17 * 2 = 2,228,224 IO operations
 The time with internal run generation is
65536 * (1+ log 65536/1024) reads + the same number of writes +
internal sort time
= 65536 * 7 * 2 = 917,504 IO operations + 64 1024record sorts
3. This process is stable iff the inernal sort used is stable. If
stability is not a concern, it is common to use an internal sort
like quicksort. (Note that a stable internal sort is either O(n^2),
or it requires O(n) extra space, which cuts down on the size of the
initial runs that can be created by internal sorting!)
V. Sorting with multiple keys
    
A. Thus far, we have assumed that each record in the file to be sorted
contains one key field. What if the record contains multiple keys 
e.g. a last name, first name, and middle initial?
1. We wish the records to be ordered first by the primary key (last
name).
2. In the case of duplicate primary keys, we wish ordering on the
secondary key (first name).
3. In the case of ties on both keys, we wish ordering on the tertiary
key (middle initial).
etc  to any number of keys.
B. The approach we will discuss here applies to BOTH INTERNAL AND EXTERNAL
SORTS.
C. There are two techniques that can be used for cases like this:
1. We can modify an existing algorithm to consider multiple keys when
it does comparisons  e.g.
a. Original algorithm says:
if (item[i].key < item[j].key)
b. Revised algorithm says:
if ((item[i].primary_key < item[j].primary_key) 
((item[i].primary_key == item[j].primary_key) &&
(item[i].secondary_key < item[j].secondary_key) 
((item[i].primary_key == item[j].primary_key) &&
(item[i].secondary_key == item[j].secondary_key) &&
(item[i].tertiary_key < item[j].tertiary_key)) )
2. We can sort the same file several times, USING A STABLE SORT.
a. First sort is on least significant key.
b. Second sort is on second least significant key.
c. Etc.
d. Final sort is on primary key.
3. The first approach is usable when we are embedding a sort in a
specific application package; the second is more viable when we are
building a utility sorting routine for general use [but note that we
are now forced to a stable algorithm.]
VI. Pointer Based Sorting
   
A. When the items being sorted are large records (perhaps hundreds of
bytes each, it may be desirable to use a pointerbased approach to
reduce the time spent moving data. The following are some variants
on this theme.
B. ADDRESS TABLE SORTING: we use an array of pointers P[1]..P[N]. Instead
of physically rearranging the records (which is costly in terms of data
movement time), we leave the records in their original place and
sort the array of pointers so that: P[i]^.Key <= P[j]^.Key
for all i <= j.
C. KEY SORTING: if the key is short relative to the whole record, then
we sort an array consisting of keys plus pointers to the rest of the
record, so that we only move keys and pointers, not whole records.
At the very end of the sort, we may physically rearrange the records
themselves.
D. LIST SORTING: we keep the records on a linked list, and rearrange
links rather than moving records. (We will use this in several
of the algorithms below.) Again, at the very end of the sort, we may
physically rearrange the records themselves.