COMP 212
Programming Assignment 3
Due May 10, 2000
I bet you always wondered which symbol table implementation would be
best in a situation where you insert items in order of key value but
remove them in random order and which would be best when the opposite
happens, i.e., the items are inserted randomly but removed in order.
Well, here is your chance to find out.
To this end your first task is to implement three versions of a
symbol table:
- 1.
- ST1 - using an ordered array with binary search and lazy remove.
- 2.
- ST2 - using an ordinary binary search tree.
- 3.
- ST3 - using one of a randomized binary search tree, a splay
binary search tree, a 2-3-4 tree,
a red-black tree or a skip list (your choice).
For each version you should supply methods for (i) initializing an
empty symbol table
(ii) inserting an item into the symbol table
(iii) searching for an item with a given key value in the symbol table
and
(iv) removing an item with a given key value from the symbol table.
Assume that in case of multiple copies of
items, searching and removing applies to any one of the copies.
(In the tests below, we will ensure that all items are distinct.)
Having implemented (and hopefully tested they are correct!) the above
versions you are to compare how well they perform on the following
two sets of tests, for
N=2i*10000, i=0, 1, 2, 3, 4.
- 1.
- Insert N items with key values 1, 2, 3, ..., N
into the
symbol table, in order of smallest to largest.
Perform N randomly generated search operations on
keys in the table. Remove the items in random order. (See note below
on generating a random permutation.) You should time your symbol table
separately on the time required for the inserts, searches and removes.
- 2.
- Insert N items with key values 1, 2, 3, ..., N
into the
symbol table, in random order.
Perform N randomly generated search operations on
keys in the table. Remove the items in order of smallest to
largest. Again, you should time your symbol table
separately on the time required for the inserts, searches and removes.
You should include in your source a commented section containing a
table of your results. Also comment on how the results meet with your
expectations considering the properties of each of the implementations.
Some Notes:
-
In the book they introduce an class Item (Program 12.1) and associated
operations (e.g., accessing the key value of an item) for the items to
be inserted into a symbol table. For the purposes of this assignment
it is sufficient if you identify items with there (integer) key value,
i.e., the key value of an item is the item itself which is an
integer. If you are using any of the code from the book you must either
adapt it to this special case or implement the Item class.
- The built-in random number generator
used for the last assignment is not up to the
task required here. Why? It only produces random integers in the
range 0 to 215 -1, but for the larger tests, larger random
numbers will be required. There are a number of ways to fix this but
I recommend using the Random source class (courtesy of Lehmer, Schrage,
Weiss) presented below. Initialize the source once using
Random randsource(seed) where seed is a positive integer.
Now random integers in the range
1 to 231 -1
are returned
by each call randsource.randInt().
This is an example of what is called a linear congruential generator
that has been shown to have a number of nice properties (e.g., it
doesn't repeat until all 231-1 numbers have been generated).
Use this generator for generating the test set and for generating
random numbers if implementing randomized binary search trees or
skip lists.
-
For both tests, you will be required to generate a random permutation
of 1 to N
In the first set, this is used when removing
(all of the items) in random order. In the second set, it is used
for ordering the inserts. Note that using a random permutation
guarantees the keys will be distinct.
The simplest way to generate a random permutation is to fill an array with
the values in order and then use random choices to mix up the array.
Code for this operation is provided below.
-
Timing can be performed using the function clock()
as in the last assignment. Don't forget to include
time.h.
-
Program 12.5 gives code for the ordered array implementation of
a symbol table with sequential
search. In order to get from there to ST1 a number of changes must
be made. These include: replacing the search function with binary
search (Program 12.7) and introducing a lazy remove function. In
order to implement the lazy remove function you must keep track of
the number of active items, the number of inactive items and
the total number of items (active plus inactive). If the number of
active items is greater than or equal to the number of inactive
items, then to remove an item just mark it as inactive. Whenever the number
of inactive items becomes greater than the number of active items,
a routine should be called that compacts the array, i.e., moves all
of the active items into a contiguous block starting at the beginning
of the array. To keep track of which items are active and which are
inactive you will need an auxiliary array that marks each position in
the original array as active or inactive.
-
Most of the necessary code for ST2 can be found in sections 12.5, 12.8 and
12.9 of the book. For ST3 you have a choice. The options are discussed
in sections 13.1 to 13.5. Probably the
easiest to implement are randomized BSTs or skip lists, followed by
splay BSTs. For randomized BSTs and skip lists
most of the code is provided in the book.
If you are more adventurous try red-black trees.
- Depending on the machine, compiler, etc. the time to perform
some of these tests may be pretty high. If experiments on smaller
inputs suggest that tests on the large values of N will take
more than five minutes then you may eliminate them and mark them
in the table as more than five minutes.
If you are using a shared system
try to avoid running the experiment at a peak usage time. Make sure
you have tested your program thoroughly on
small inputs before running
the whole experiment.
-
Code
for class Random and for generating random permutations.
2000-04-25