CPS222 Lecture: Lists Last revised 12/17/14
Objectives:
1. To introduce the data type sequence (ordered list).
2. To show how sequences can be implemented by arrays, vectors, or linked lists.
3. To introduce representation for matrices as two or more dimensional arrays
I. Sequences
 
A. Many of the interesting "standard" abstract data types are variations
on the theme of a sequence.
B. A sequence is a group of items that has the following basic properties.
1. Either the sequence is empty, or it has a unique first item and
a unique last item. (If it consists of exactly one item, these
two are the same, otherwise they are different.)
2. Each item, except the last, has a unique successor.
3. If you start with the first item and apply the successor operation
repeatedly, you will eventually visit each item exactly once, ending
at the last item.
4. We may also want to define the operation of predecessor analogous
to successor:
a. Each item, except the first, has a unique predecessor
b. If you start with the last item and apply the predecessor
operation repeatedly, you will eventually visit each item
exactly once, ending at the first item.
c. Successor and predecessor are inverses  e.g.
if B is the successor of A, then A is the predecessor of B,
and vice versa
5. We may want to be able to access items by relative position in
the list (with position 0 being the first).
B. For a sequence, we have the following set of values and set of
potential operations. Note that, for different kinds of sequences,
we may be interested only in a subset of the set of operations:
Values: { all sequences of items (of some type) }
Operations: { add a new item at a specified position  interesting
special cases are beginning, end, or specific
numbered position.
access the item at a specified position  same
options as above
delete the item at a specified position  same
options as above
determine whether the sequence is empty
obtain the successor of a given item
obtain the predecessor of a given item
}
II. Representations for Sequences
   
A. There are two basic alternatives for implementing a sequence  using
an array (or a variant known as a vector), or using a linked list.
B. Arrays
1. Since arrays are supported directly in almost all programming
languages, they are an attractive representation for sequences.
a. In an array, LOGICAL ADJACENCY (B follows A) is modelled by
PHYSICAL ADJACENCY (B occurs just after A in memory.)
b. If the sequence is allowed to grow or shrink over time, we
might also store a count of the number of items, along with
the actual array of items, which would be allocated with
extra space to allow for growth.
c. In C/C++, an array is declared by a declaration of the form
[], which both declares the array and allocates
the needed storage.
i. Example:
int n[100]; // declares n to refer to an array of integers
// and allocates storage 100 integers.
ii. Contrast this with Java, where the declaration of an array and
storage allocation are two distinct steps  e.g.
int n[];
n = new int[100];
iii. An array element is accessed by subscript  e.g. n[i] is the
ith element of the array. (Subscripts are 0 origin, as in Java)
iv. A potential source of errors in C/C++ programs is that array
subscripts are not checked for legitimacy  e.g. given the
above declaration of n as an array of 100 ints, it would be
possible to refer to n[200]  which would access a storage
location belonging to some other variable. Storing a value
into this location could result in a hard to find error!
2. With an array representation of a sequence, certain operations
are very easy:
a. Accessing an item at an arbitrary position. If the items in the
sequence are numbered 0, 1, ... and we know the address in
memory of the first item, then the address of the ith item is
(address of first item) + i * (size of an item)
Example: Given the array declaration
int n[100];
and assuming that the array n starts at location 1000 in memory
and an int occupies 4 bytes of memory, then n[10] is at location
1000 + 10 * 4 = 1040
b. Obtaining the successor or predecessor of an item. If we know the address of
a particular item, then its successor is at address:
(address of current item) + (size of an item)
and its predecessor is at address:
(address of current item)  (size of an item)
c. Adding a new last item (assuming there is room for one more
item in the array)
 Store the item at address
(address of first item) + (item count) * (size of an item)
 Increment the item count
d. Deleting the last item
 Decrement the item count. (The old value is still stored
in memory, but is no longer considered part of the sequence.)
All of the above are O(1)
3. With an array representation of a sequence, certain operations
are relatively hard:
a. Adding a new item at an arbitrary position (or at the beginning)
entails moving all the items currently at the same or
highernumbered positions up one slot.
b. Deleting an item at an arbitrary position (or at the beginning)
entails moving all the items currently at highernumbered
positions down one slot.
The above are O(n), where is n is the number of items in the sequence.
4. Many programming languages (including C/C++) support creating arrays
with two or more dimensions. (A twodimensional array is often used
for modeling mathematical matrices). Though these are not sequences as we
have been talking about, we mention them briefly here. You will use a matrix
in your "Game of Life" project.
C/C++ Example
a. Declaration
float x[10][20]; // Declares x to be a matrix of floats
// The matrix has 10 rows and 20 columns
// Allocates storage for 200 floats
b. Access
x[i][j] refers to the element in row i and column j
C. Vectors
1. When we create an array, we must specify how many items it
may contain. If the sequence grows larger than this, we
typically have to move the entire array to some new, larger
location in memory, since the memory allocator typically will
have put other variables immediately after the space we reserved
for the array. This is a nontrivial O(n) exercise at best  and
may not even be easily possible.
For this reason, we may be tempted to allocate memory to more
than adequately accomodate the potential growth of the sequence 
which leads to either wasting memory or an unpleasant surprise
when we discover we guessed too small!
2. Many languages provide a variant typically known as a vector which can be
resized at any time  though increasing the size can take
O(n) time because the vector is implemented by an array that
may need to be copied to a new larger location in storage.
D. Linked lists
1. The use of a linked implementation of a sequence typically requires
that the programming language support variables of pointer or vector type 
which we will discuss in the next lecture.
2. The fundamental idea is that we abandon the notion of modelling
logical adjacency by physical adjacency. Instead, we associate
with each item an explicit LINK  the address in memory of its
successor.
EXAMPLES: We often represent linked lists using a "box and arrow"
notation
  
 A   B   D 
 o> o> o
   


(It is common to refer to the individual boxes as NODES.)
Form class into a list linked in alphabetical order
by pointing to each other.
3. With a linked representation of a sequence, certain operations
are very easy:
a. Adding a new item at an arbitrary position is a matter of
readjusting links  assuming we know its predecessor.
EXAMPLES: Show modifications to above drawing to insert a node
containing "C" just after "B".
Show process of inserting a new person into class
list.
b. Deleting an item at an arbitrary position is a matter of
readjusting links  assuming we know its predecessor.
EXAMPLE: Show modifications to above drawing to delete node
containing "B".
Show process of deleting a person from class list.
c. Accessing the successor of an item involves following its link.
The above pointer operations are O(1).
4. With a linked representation of a sequence, certain operations
are relatively hard:
a. Accessing an item at an arbitrary position entails starting
at the beginning and following links (traversing the list)
the required number of steps  e.g. to access item 10,
we start at the beginning (item 0) and follow links 10
times.
This is an O(n) operation.
(Note that this may also be part of the cost of adding or
deleting an item at an arbitrary position, since we need
access to its predecessor  unless we are already there
for some reason.)
b. Accessing the predecessor of an item entails starting
at the beginning of the list and following links until
we find a node whose successor is the one we want  i.e.
the links are "one way streets".
This is an O(n) operation.
(This can be avoided by maintaining a doublylinked list,
in which each node has two links  one to its successor
and one to its predecessor.)
5. Provided memory is not totally full, it is easy to grow the
sequence by allocating a new node and linking it in at the
right place. There is no need to specify a size up front.
E. You should already be quite familiar with working with arrays. In the next
lecture, we turn to implementing linked lists, using C++.