CPS222 Lecture: Disk-Based Search Structures; B-Trees Last revised 1/30/2013
Objectives:
1. To introduce hashtables on disk.
2. To introduce B-Trees and key variant (B+ tree)
I. Introduction
- ------------
A. All of the search structures we have considered thus far have one thing
in common: they are stored in main memory, where access to any item
is equally fast (true random access). However, it is often the case
that we must build search structures on disk rather than in primary
memory, for two reasons:
1. Size. Large structures cannot be kept in their entirety in main
memory.
2. Permanency. Structures in main memory are volatile and need to be
created (or read in from disk) whenever a program using them is run.
One important use of disk-based search structures is to index tables in
databases, to expedite operations like selection and natural join. (The
use of an index avoids the need to read every row in the table to find
a desired key value.)
B. When we build structures on disk, we must deal with certain realities of
access and transfer time:
1. Random access to disk typically requires on the order of 10-20 ms
access time to position the head and wait for data to come up under
it. This is equivalent to about 10 million CPU cycles!
2. However, once the head position is right, data can be transferred at
rates in excess of 100 million bytes/sec.
3. Observe, then, how total transfer times behave for different size
blocks (assuming a 10 ms access time, and 100 megabyte/sec transfer
rate)
Size of block Access time Transfer time Total time
1 byte 10 ms .01 micro-sec 10.00001 ms
10 bytes 10 ms .1 micro-sec 10.0001 ms
100 bytes 10 ms 1 micro-sec 10.001 ms
1000 bytes 10 ms 10 micro-sec 10.01 ms
10000 bytes 10 ms 100 micro-sec 10.1 ms
100000 bytes 10 ms 1 ms 11 ms
Clearly, then, transfers have very high overheads for access time
(often in excess of 99%), so one would prefer to organize a search
structure in such a way as to allow fairly large transfers to/from
disk.
4. For this reason, disk files are typically block-oriented.
a. Data is stored in blocks of some fixed size (determined by disk
geometry) - typically a power of 2 ranging from 512 on up. While
access/transfer time considerations argue for fairly large blocks,
too large of a block can result in wasting space in the last block
of a file when the file size is not a multiple of the block size -
as it rarely is. (On the average, each file wastes about 1/2 block
of storage.)
b. Blocks in a given file are numbered 1, 2, 3, ... (there is no
block 0.) Any given block can be accessed at any time by
specifying its block number, which serves as a kind of on-disk
"pointer" to the block. (In the discussion that follows, when
we use the term "pointer" we mean such a block number.) A block
number of 0 refers to a non-existent block - the disk equivalent
of a null pointer.
c. However, different blocks are usually stored at different
places on the disk, so that accessing data in two different
blocks entails the costs associated with two disk accesses.
(Defragmentation of a disk can bring consecutive blocks
physically together, which reduces this cost somewhat, but the
benefits are not permanent, and the code cannot be written to
assume the ability to access two successive blocks in one
operation.)
5. Most of the structures we have considered do not lend themselves well
to on-disk implementation because they require accesses to too many
different parts of the structure.
E.g. a binary search tree of 1000 nodes has minimum height 10, so
would require 10 disk accesses / item.
II. Hashtables on disk
-- ---------- -- ----
A. While most of the search structures for main memory that we have looked
at do not adapt well for use on disk, one does - the hashtable.
B. Recall that a hashtable consists of a number of buckets, each of which
in turn consists of a number of slots that can hold a key and its
associated value, with the hash function determining what bucket holds
a given key-value pair.
1. When a hashtable is stored in main memory, normally each bucket has
just a single slot.
2. But when a hashtable is stored on disk, an entire block is normally
used as a bucket, with as many slots as there is room in a block for
key-value pairs.
Example: If the block size is 4096, and a key-value pair requires
100 bytes, then a bucket will have 40 slots (with some bytes
available for overhead or wasted)
a. In this case, the hash function determines which disk block should
hold a given key and its associated value, with a search of the
bucket being needed to find the correct slot.
b. However, the time for this search is typically very small relative
to the access time for the bucket in the first place.
C. Another difference between in-memory and on-disk hashtables arises in
conjunction with handling collisions.
1. When a collision occurs with an in-memory hashtable, a strategy like
linear probing is used to find an available slot for the item being
added.
2. Of course, collisions are less frequent when using a bucket with
multiple slots; but if a bucket fills up, the normal strategy is to
allocate an additional disk block, with the original block containing
a pointer to this overflow block. This strategy is called chaining.
Example: Suppose we had a hashtable on disk using a bucket size of 5,
and six keys hashed to the same bucket. The following situation would
result:
--------------------- ---------------------
| Primary bucket | | Overflow bucket |
| [ first five keys | | [ sixth key and |
| and associated | | associated |
| values ] | | value ] |
| o---|--->| |
--------------------- ---------------------
a. At this point, any new keys that hash to this chain would be added
to the overflow bucket until it fills up - at which time another
overflow bucket could be allocated, with the chain consisting of
three buckets.
b. In the worst case - typically resulting from an overfilled table or
a poor hash function - a chain could become long, resulting in
performance tending toward O(n) rather than O(1). But this is
improbable if the hash function and table size are chosen well.
III. B-Trees
--- -------
A. Now, we consider a search structure specifically designed for use with
disk files: the B-Tree.
1. A B-Tree is a form of search tree in which breadth is traded for depth.
Each node contains multiple keys (instead of 1 as in a binary search
tree), and so has multi-way branching rather than 2-way branching.
2. The result is a very "bushy" sort of tree, with typical heights in the
range 2-4, thus requiring only 2-4 disk accesses per operation.
3. Further performance improvements can be had by keeping a copy of the
root block in main memory while the file is open, reducing the
effective height of the tree by 1. It may also be possible to
cache the blocks at the next level down. These two steps could
reduce the number of disk accesses needed for most operations to 1
or 2.
B. Preliminary: An m-way search tree.
1. An m-way search tree is a tree of degree m in which:
a. Each node having s children (2 <= s <= m) contains s-1 keys.
Let the children be called t .. t and the keys k .. k
0 s-1 0 s-2
b. The keys and children obey the following properties:
i. k < k < k ... < k (or <= if duplicate keys are
0 1 2 s-2 allowed; we will assume <
and no duplicates for our
examples.)
ii. All the keys in child t are < k
0 0
iii. All the keys in child t (1 <= i < s-1) lie between k and k
i i-1 i
iv. All the keys in child t are > k
s-1 s-2
2. Observe that a binary search tree is simply an m-way search tree
with m = 2.
3. Examples of 4-way search trees:
C F K A
/ / \ \ / \
A B D E HIJ L B
/ | \ / | \ /||\ /\ / \
C
(where the empty children are called / \
failure nodes, and are represented by D
"null pointers") / \
E
/ \
F
/ \
etc..
a. Clearly, the first is much more desirable than the second!
b. Note that if the first example were implemented as a binary search
tree instead, it would have height 4 instead of 2, a sizable cost
increase at 10ms per disk access. (The savings become even larger
as m increases. For example, the first example could be implemented
as a 12-way search tree with only 1 level.)
4. Observe that a 2-3-4 tree is simply a variant of a 4-way search
tree. In fact, B-Trees generalize some of the ideas we used with
2-3-4 trees, though the standard algorithms for maintaining B-Trees
are slightly different from those used with 2-3-4 trees.
5. When a search tree is stored on disk, each node is typically one
block (or cluster) and the branching factor m is chosen so that
a node with a maximal number of keys and children just fits in
one block (cluster).
C. Definition of a B-Tree
1. As was true with binary search trees, we recognize that m-way
search trees can be very efficient if well-balanced, but have
undesirable degenerate cases. With binary search trees, we defined a
variants such as AVL trees and Red-Black trees that avoid the
degenerate behavior. We do the same here.
2. A B-Tree of order m is an m-way search tree in which:
a. All the failure nodes are on the same level. (The term "failure
node refers to the empty subtree one ultimately encounters when
one is searching for an nonexistent key. Of course, there isn't
really any such node - it is represented by an impossible block
number (corresponding to a null pointer in an in-memory tree.))
b. Each internal (non-failure) node, except the root, has at least
__ __
| m |
___ children
2
c. The root, if it is not a failure node (meaning the tree is
totally empty), has at least 2 children.
d. Of our two examples, only the first is a B-Tree
3. Examples: which of the following is/are B-Tree(s) of order 5:
E J O E J O E J O E J O
/ / \ \ / / \ \ / / \ \ / / \ \
ABC F KLM PQRS ABF GH KL PQRS AB FI KL PQRS AB HI KL PQRS
//\\ /\ //\\//|\\ //\\ /|\ /|\ //|\\ /|\ /|\ /|\ //|\\ /|\ /|\ /|\ //|\\
FG
/|\
no: Node "F" no: the tree is yes no: all the failure
has only 2 not a search tree, nodes are not on the
children since F > E same level
4. Note: because all the failure nodes of a B-Tree are on the same
(bottom) level, we normally do not bother to draw them. Thus,
we will draw the one good tree in the above example as follows from
now on:
E J O
/ / \ \
AB FI KL PQ
D. Some properties of a B-Tree
1. What is the MAXIMUM number of KEYS in a B-Tree of order m of
height h? (Measuring height in terms of the number of NODES).
a. In such a tree, each non-failure node would have the maximal
number of children (m), and thus the maximal number of keys (m-1).
Thus, we would have:
1 node m-1 keys at level 1
m nodes m * (m-1) keys at level 2
m**2 nodes m * m * (m-1) keys at level 3
...
m**(h-1) nodes m**(h-1) * (m-1) keys at level h
--------------------
m**h - 1 keys total
b. Compare our result for complete binary trees of height h -
2**h - 1 nodes. (h measured in NODES)
2. What is the MINIMUM number of KEYS in a B-Tree of order m of
height h? (Measuring height in terms of the number of NODES).
a. In such a tree, the root would have only 2 children (1 key), since
this is the minimum allowed for a root. All other nodes would
have ceil(m/2) children, and ceil(m/2) - 1 keys.
b. For convenience, let c = ceil(m/2).
1 node 1 key at level 1
2 nodes 2 * (c-1) keys at level 2
2*c nodes 2 * c * (c-1) keys at level 3
2*c**2 nodes 2 * c**2 * (c-1) keys at level 4
...
2*c**(h-2) nodes 2 * c**(h-2)*(c-1) keys at level h
-----------
2 * [c**(h-1)-1] + 1 =
2 * c**(h-1) - 1 keys total
3. To determine the height of a B-Tree of order m containing n keys, we
solve each of the above for h, as follows:
a. From the equation for the maximum number of keys, we know:
n <= m**h - 1
or, solving for h:
n+1 <= m**h
log (n+1) <= h
m
Now, since h must be an integer, we can take the ceiling of the
log to obtain:
___________
| log (n+1) | <= h
m
b. From the equation for the minimum number of keys, we know:
n >= 2 * c**(h-1) - 1
or, solving for h
(n + 1)
------- >= c**(h-1)
2
log ((n+1)/2) >= h-1
c
h <= 1 + log ((n+1)/2)
- -
| m |
___
2
Now, since h must be an integer, we can use the floor of the log
to obtain:
h <= 1 + | log ((n+1)/2) | (floor of the log to the base
| _ _ | ceiling m/2 of ((n+1)/2))
| | m | |
| ___ |
| 2 |
|___________________|
c. Combining the above results from minimal and maximal trees, we
obtain the following bounds for h:
ceil(log (n+1)) <= h <= 1 + floor(log (n+1)/2)
m ceil(m/2)
4. Some examples:
a. 1 million keys - B-Tree of order 200: height is 3
- Lower bound is ceil(log 1,000,001) = 3
200
(Note that a maximal tree of height 2, order 200, contains
39,999 keys - so tree must have height at least 3)
- Upper bound is 1+ floor(log 500,001) = 3
100
(Note that a minimal tree of height 4, order 200, contains
1,999,999 keys, so tree must have height no greater than 3)
a. 2 million keys - B-Tree of order 200:
- Lower bound is still 3
- Upper bound is now 4
so the height could be 3 or 4.
E. An important note: In our discussion, we have talked only about nodes
containing KEYS. In practice, we build search structures to allow us
to associate keys with VALUES (e.g. name = key; phone number and address
= value). In the form of B-Tree we are discussing, then, a node actually
contains s pointers to children, s-1 keys, and s-1 values. These can be
stored in one of two ways:
1. The actual value can be stored in the node. This, however, can reduce
m, and thus the branching factor of the tree, if the size of the value
is large compared to that of the key (as it often is).
Example: node size 8000, key length 12, pointer size 4 bytes, allows
m = 500 if we don't store any value with the key
if we also have to store a value of size 36, however, we
would reduce m to 125
2. The node can contain the number of another disk block that stores the
actual value. (This additional pointer adds minimally to the size of
the node.) However, this means that successful searches require an
additional access to get the data.
3. We can use a variant of the B-Tree called a B+ Tree - to be discussed
shortly.
IV. Operations on B-Trees:
-- ---------- -- -------
A. For our examples, we will use the following B-Tree of order 3 (sometimes
called a 2-3 tree, since each node has 2 or 3 children and 1 or 2 keys).
We use order 3 to keep the size of the examples down:
J T
C F M P Y
AB DE GH KL NO R UW Z
B. Locate a given key k in a (sub) tree whose root is block t. (Assume
blocks in the file are numbered 1 .. , with 0 denoting a failure
node.)
InfoType locate(KeyType k, int t)
{
if (t == 0)
search fails ---
else
{
read block t from the disk
determine how many keys it holds
int i = 0;
while (i < number of keys && key[i] < k)
i ++;
if (i < number of keys && key[i] == k)
return associated information
else
return locate(k, child[i])
}
}
Example: Locate J: succeeds immediately at the root
Locate L: at the root, we end the while loop with i = 1,
since key[1] = 'T', so we go to the second child of
the root
in that child, the while loop exits with i = 0,
(since key[0] = 'M' >'L'), so we go to the first
child.
In that node, we find what we are looking for.
Locate Z: at the root, we end the while loop with i = 2,
since i = Number of keys, so we go to the third
child of the root
in that child, the while loop exits with i = 1, for
the same reason, so we go to the second child.
In that node, we find Z
Locate X: at the root, we end the while loop with i = 2
In the third child, the while loop exits with i = 0,
since key[0] = 'Y' > 'X', so we go to the first
child.
In that node, we exit the while loop with i = 2.
Since the third child (and all children, in fact) of
this node is a failure node, the search fails.
C. Inserting a new node. (Assume that we disallow duplicate keys).
1. We first proceed as in locate, until we get to a leaf - that is,
a node whose children are empty. (Of course, if we find the key we
are looking for along the way, we declare an error and quit.)
2. If the leaf node we have arrived at contains less than the maximum
number of keys, then we simply insert the new key at the appropriate
point, and add an extra failure node child pointer (0).
Example: Insert S: We work our way down to the node containing
R. Since it contains only one key, and can
hold two, we add S, and our tree becomes:
J T
C F M P Y
AB DE GH KL NO RS UW Z
Note that inserting a new key in a leaf may require moving other
keys over.
Example: Insert Q in the original tree. Result:
J T
C F M P Y
AB DE GH KL NO QR UW Z
^
|__ R has been moved over one place
3. Life becomes more interesting if the leaf we reach is already
full. (E.g. consider trying to insert "X" in the above.) In this
case, we cannot add a new node on a lower level, since this would
violate one of the B-Tree constraints. Instead, we proceed as
follows:
a. Allocate a new node from a free list, or extend the file by one
node.
b. Redistribute the keys in the original node, plus the new key,
so that:
- The first half remain in the original node
- The middle key in the order of keys is held out for a use to
be explained shortly.
- The second half go into the new node.
Note:
The key we were inserting can go into either of the nodes, or
it might be the middle key. (e.g. if we are inserting X, it
will go into the new node; but if we are inserting V into
the same node, it would be the middle key.)
c. Insert the middle key we saved out, plus a pointer to the newly
created node, into the parent at an appropriate point, just after
the pointer that we followed to go down to the node we split. Of
course, this means we move keys and pointers into the parent to
make room.
Example: insert X into the original tree
__________ this key was promoted into parent
J T |
v
C F M P WY
AB DE GH KL NO R U X Z
^ ^
| |___ new node now contains X
|______ original node now contains only U
4. Observe that this strategy guarantees that the resulting tree will
still meet all the tests for a B-Tree.
a. Clearly, all the leaves are still on the same level.
b. What about the number of children of the new nodes?
- If we are forced to split a node, it is because it contained
the maximum number of keys before insertion - m-1. With the
new key, this gives m keys, to be divided two ways plus the
promoted key. This leaves m-1 keys to be divided.
- If m is odd, then each node gets (m-1)/2 keys, and has
(m+1)/2 children, which is exactly ceil(m/2), as required.
- If m is even, then one node gets m/2 keys - 1, and the other
gets m/2 keys. The smaller node then has m/2 children, which is
exactly ceil(m/2), as required. (The larger node has more than
the minimum, which is fine.)
5. Now what if there is no room for the promoted key in the parent?
(Example: insert I into the original tree. Node GH splits, with
H to be promoted to node CF. But this node has no room for another
key and child.)
a. Solution: split the parent as before, creating a new parent to
hold half of the keys and pointers to half of the children.
Again, promote one key and a pointer to the new node up one
level.
b. Note that, if carried to its ultimate, this can result in splitting
the root, which is how the tree gains height. At this point, the
single middle key resulting from the splitting of the root becomes
the one key in the new root. (This is why we allow the root of a
B-Tree to have as few as 2 children.)
Example: insert I:
J
/ \
F T
/ \ / \
C H M P Y
/ \ / \ / | \ / \
AB DE G I KL NO R UW Z
6. You will note that the approach we have taken to splitting nodes in
a B-Tree is somewhat different from the one we used with 2-3-4 trees.
Here, we have postponed splitting a node until absolutely necessary.
a. If one splits a full node when it is not necessary to do so, the
result won't be a BTree if m is odd - e.g. when inserting S into
the above example, if we split MP (which we don't have to), and
promoted one of the keys to the node containing T, we'd end up
with one empty node in the middle of the tree! If m were > 3, we
would not end up with an empty node, but would end up with one child
having less than m/2 children.
b. The price is more complex code; but given the time required for disk
accesses this policy makes some sense, since it postpones height
increases until the absolute last moment possible. (One could choose
to compromise the B-Tree requirement and use the 2-3-4 tree approach
of anticipating the need for splits on the way down the tree,
however - trading simpler code for earlier splitting of the root.)
D. Deletion from a B-Tree
1. As we have seen in other kinds of trees, deleting a key from a
leaf will be much simpler than deleting a key from an interior node.
As before, then, we use the trick of converting a deletion from an
interior node into a deletion from a leaf by promoting a key from
a leaf - typically the first key in the leftmost subtree of the
child just after the key.
Example: to delete J from the root of our original tree, we would
promote K to take its place, and delete K from the leaf.
2. Deleting a key from a leaf is basically trivial - we simply slide
other keys over as necessary to fill in the gap.
Example: Delete N from our original tree:
J T
C F M P Y
AB DE GH KL O R UW Z
3. However, we can run into a problem if the leaf we are deleting from
already contains the minimal number of keys.
Example: Delete R from our original tree.
4. In this case, we essentially reverse the process we used to deal with
an over-full node on insert.
a. We find one of the siblings of the node from which we are deleting
(we can use either side.)
b. We rearrange keys between the node we are working on and the sibling
so as to give each the minimal number, if possible. This will mean
changing the divider key between them in the parent.
Example: When deleting R from the original tree, we can combine R's
node with NO and rearrange as follows:
J T
C F M O Y
AB DE GH KL N P UW Z
c. If, as a result, we do not have enough keys to make two legal
nodes (i.e. if the sibling we are using also contains the minimal
number of keys), then we combine the two nodes into one, also
removing a key and child pointer from the parent.
Example: working with the above (not the original tree), we can now
try to delete P. Since the only node we can combine with
is N, and it has the minimal number of keys already, we must
pull O down from the parent and combine everything into one
node, recycling the other:
J T
C F M Y
AB DE GH KL NO UW Z
d. Of course, removing a key from the parent may get us into trouble
as well. (e.g. suppose that, in succession, we removed L, N, and
then O from the above). In this case, the parent may have to
"borrow" keys and children from a sibling. In an extreme case,
we may even have to merge the parent with its sibling, and could
ultimately even reduce the height of the tree if we had to merge
two children of the root.
Example: Recall the tree we got by splitting the root:
J
/ \
F T
/ \ / \
C H M P Y
/ \ / \ / | \ / \
AB DE G I KL NO R UW Z
Suppose we now try to delete Z:
- We have to merge UW with the now vacated node.
J
/ \
F T
/ \ / \
C H M P W
/ \ / \ / | \ / \
AB DE G I KL NO R U Y
Suppose we now delete Y: We must merge with U, but this pulls W
out of the parent, leaving it with too few keys. (Zero, in this
case; but in a higher degree tree we're in trouble when the
number of keys drops below ceil(m/2) - 1.
- We therefore rearrange keys and children with MP:
J
/ \
F P
/ \ / \
C H M T
/ \ / \ / \ / \
AB DE G I KL NO R UW
V. Variants of the B-Tree
- -------- -- --- ------
A. Achieving good performance with a disk-based search structure requires
keeping the height of the tree down (2 is rarely practical, but 3 is
often ideal) - which in turn entails using a large branching factor.
Indeed, if we know the desired size of the tree, we can calculate the
minimum branching factor needed to guarantee a certain height, using
formulas derived earlier. In particular, for a B-Tree, we can guarantee
a maximum height of 3 as follows:
n Minimum m
1000 15
10,000 34
100,000 73
1,000,000 158
10,000,000 341
100,000,000 736
(Clearly, m grows rather slowly with n)
B. However, considerations of node size may make it difficult to achieve
the necessary branching factor. Recall that, in a B-Tree, each node
contains up to m pointers, m-1 keys, AND M-1 VALUES ASSOCIATED WITH THE
M-1 KEYS.
1. Suppose we need an m-value of 20 for some application in which the
keys are 10 bytes long and the associated values are 100 bytes long.
If a pointer is 4 bytes, then the minimum node size is
4*20 + 19*(10+100) = 2170 bytes
2. For efficient performance, we must guarantee that each node can be
read or written with a single disk access - which requires that the
entire node reside in a single contiguous blocks on disk. Since this
is dictated by disk geometry and system software, the designer of
a B-Tree is usually faced with a fixed upper bound to node size.
Example: If we were working with a disk that restricted the block
(or cluster) size to 2048, then we could not achieve the
desired branching factor with these parameters.
C. Two techniques can be used to hold the node size down while still
achieving a desired branching factor. These techniques lead to two
variants of the B-Tree.
D. An often-used variant of the B-Tree known as the B+ tree.
1. We saw that the total size of a node is generally the limiting factor
in terms of the value of "m" that can be used for a B-Tree. Here, of
course, the main villain is usually the value that is stored in the
node along with the key. If the value is - say - 10 times bigger than
the key, then its presence in the node reduces the potential branching
factor by a ratio of almost 10:1!
2. One way to address this would be to not store the values in the tree
at all.
a. Rather, each node would contain up to m child pointers, m-1 keys
and m-1 POINTERS TO VALUES STORED ELSEWHERE.
b. The difficulty with this scheme, though, is that once the tree has
been searched to find the desired key, an additional disk access is
needed to find the data. The effect on performance is the same as
if the height of the tree were increased by one, so this may undo
the gain obtained by using the higher branching factor.
3. A B+ tree addresses this problem as follows:
a. Values are only stored in the lowest level of the tree. Nodes at
higher levels contain keys, but not values.
b. This means that the branching factor in the upper levels is much
greater than the branching factor at the lowest level (where the
children are failure nodes.)
Example: assume nodes are 512 bytes, keys are 10 bytes, values are
90 bytes, and pointers are 4 bytes.
Each node in the lowest level of a B+ tree could store up
to 5 key-value pairs, with 12 bytes to spare. (No pointers
need be stored, because the 6 children are all failure
nodes.)
Each node at upper levels would have branching factor 37.
It would store up to 36 key-pointer pairs, plus one extra
pointer, with 4 bytes to spare.
We assume that we can distinguish between a leaf and a
non-leaf node in some way during our processing - perhaps
by keeping track of the height of the tree or by tagging
the node itself or the pointer to it in some special way.
c. Of course, this means that all keys must occur at the lowest level
of the tree, so that a value can be stored with them. The keys in
the upper levels, then, are copies of keys stored lower down; some
keys are stored twice. In particular, each upper level key is a
copy of the least key in the subtree to its right. (Alternately, we
could store the greatest key in the subtree to the left.)
Example: given the above scenario, assume that we have a B-Tree
that holds the 26 letters of the alphabet as keys.
Since the maximum branching factor of a leaf is 6, the
minimum branching factor would be 3, and each leaf would
hold 2-5 keys. Thus, we would have 6-13 leaves, which
could easily be accomodated as children of a single root
node. Thus, our tree might look like this:
-------------------------------------
| C F J L O R T W |
-------------------------------------
/ | | | | | | | \
AB CDE FGHI JK LMN OPQ RS TUV WXYZ
Note that the separator keys in the root are copies of the
first key in the leaf to the right of the separator. We
could also have chosen to store the last key in the leaf
to the left.
d. For this particular set of characteristics, we might contrast:
i. This B+ tree of height 2, with plenty of room to grow without
gaining height.
ii. An ordinary B-Tree of order 6 (where all levels hold values) -
which would be of height 3 for this configuration.
e. Just to illustrate the concept of a B+ tree further, we consider the
what might happen with different assumptions about how many keys
would fit in a non-leaf node, so that we end up with a three-level
tree, like the following:
-----
| L |
-----
/ \
--------------- ------------------
| C F J | | O R T W |
--------------- ------------------
/ | | | | | | | \
AB CDE FGHI JK LMN OPQ RS TUV WXYZ
Note that the root key - L - is a copy of the smallest key in its
right subtree, which doesn't actually occur in its child but only
at the bottom of the tree.)
4. A common modification of the B+ Tree is to add across the bottom
level, like this:
-----
| L |
-----
/ \
--------------- ------------------
| C F J | | O R T W |
--------------- ------------------
/ | | | | | | | \
AB->CDE->FGHI->JK-->LMN->OPQ->RS->TUV->WXYZ
a. There is a slight overhead for this, of course - but typically
the key/value size doesn't evenly divide the block size, so some
extra bytes are available for the link.
b. This arrangement facilitates range queries, where we want to find
all entries whose keys lie in a certain range.
E. A B* Tree of order m is an m-way search tree in which each node (save
the root) has a minimum of (ceil (2m-1)/3) children and a maximum of m
children.
1. Nodes in a B* Tree of order m have the same size as those in a
B Tree of order m, but their minimum branching factor is greater.
2. This is achieved by using the following strategy on insertion of
a new key:
a. If the leaf in which the key belongs has room for the new key,
then it is put there (as with the basic B-Tree.)
b. However, if the leaf is full, then instead of splitting the
leaf we choose one of its siblings and attempt to redistribute
keys between the two leaves. (This is sort of the reverse of what
we did when deleting a key from a B-Tree.)
Example: A B* Tree of order 5 has 4-5 children (3-4 keys) for each
node. Consider the following example of such a tree:
E K P V
/ | | | \
ABCD FGHI LMN RSTU XYZ
If we go to insert Q, we find that leaf RSTU is full. Instead
of splitting it (which would force a split of the root and a new
level in the tree), we combine RSTU with one of its siblings - say
LMN - and rearrange keys between them and the divider in the
parent to get
E K Q V
/ | | | \
ABCD FGHI LMNP RSTU XYZ
c. If the chosen sibling is also full, then we combine the keys from
the two nodes and split the result to give three nodes. This
preserves the ceil((2m-1)/3) branching factor.
3. To see the advantage of the B* Tree, consider the following table
showing the minimum number of keys in a B Tree and a B* Tree of
height 3 for different values of m
m minimal B Tree height 3 mimimal B* Tree height 3
5 17 17
10 49 97
20 199 337
50 1249 2177
100 4999 8977
200 19,999 35,377
(The advantage is even greater for higher trees)