Computer Science 210 Lab 7: Binary Search Trees and Index Generation
Due: April 17, 1998 (Note: Thislab must be done individually.)

Objectives and Overview: The binary tree is a widely-used data structure in computer science. It is a useful device for representing a variety of information, like the operands and operations in an arithmetic expression. The binary search tree is a refinement of the binary tree in which the elements, or nodes, are ordered. This lab involves building a binary search tree, traversing it, and understanding its use in the design of a program that generates an index of the words that appear in a free-running text. Chapters 12 and 18 in your text provide support and explanations for this lab.

Part 1 - Examine the Indexing Program

The program Xref.cpp is a good example of the use of binary search trees and queues to solve an important problem in computing. You should read over the discussion of this problem that appears in section 12.2 of your text. Below is a sample output from running the program to generate an index, called a "cross-reference listing," of the input text given in the file tictactoe.cpp (recall that this program is the one that you exercised in an earlier lab).

The interpretation here is that all words (identifiers) in the text file "tictactoe.cpp" appear in this list, alphabetically, and alongside each one is a listing of the line numbers in which each word appears. The text file is processed one line at a time, and each new word encountered is inserted into a binary search tree. Each repeated instance of a word is noted by adding its line number to the queue of line numbers already noted for that word. For instance, the second occurrence of the word "Board" appears on line 12 of the original text, as does the third.

After gathering copies of the files indicated below on your desktop and building a new project, run this program with the input file tictactoe.cpp and replicate the output shown above. Be sure to copy this file down from the server CS210 -> Tucker to your destop and place it inside your Lab7 project folder alongside the programs shown below.

After doing this, rerun the program with the different input file literacy.notes, which you should also copy down from the server to your desktop. Now answer the following questions about the program Xref.cpp when it is run with the program literacy.notes as input.

  1. Which word will appear in the root node of the binary tree TheIdentifiers, and why?
  2. Draw a picture of the partially constructed binary tree after the first ten words in the text have been inserted. Show only the words in this picture, not their associated lists of line numbers.
  3. draw another diagram of the queue of line numbers that appears for the word artifacts after the program run is completed.
  4. When iterating on this tree, the program uses the Iterator Itr. What is the value of the first node retrieved from the tree TheIdentifiers when the call Itr.First() is called in the for statement inside the GenerateCrossReference function?
  5. In the loop that displays each node of the tree, the variable ThisNode is a local variable that gets the value of each node from the binary search tree. The type of this variable is IdNode, which is essentially a pair of values -- a String Word containing the word itself and a Queue Lines containing the individual line numbers (integers) where that word appears in the text. Explain in English the meaning of the following pair of statements:
    1. cout << ", " << ThisNode.Lines -> GetFront();
    ThisNode.Lines -> DeQueue();
  6. The binary search tree structure enables the individual words to be displayed alphabetically, given an inorder traversal of its nodes. But no information is given about the final shape of the tree after all words and their line numbers have been inserted. In particular, it gives us no information about how badly the tree is out of balance. What is the optimal height for this particular tree? That is, based on the number of nodes it has, how deep should be its longest path from the root to a leaf node? In a few words, explain what you would have to do to discover the shape of this tree, to discover the maximum depth of an individual node.

Part 2 - Augment the Binary Search Tree Class

Your programming task is to augment the Binary Search Tree class by adding a public function called Depth that will compute and return the depth of an arbitrary node (the depth of the root is 0, the depth of each of its children is 1, and so forth).

One way to compute the depth of a node is to write a variation of the Find function that returns an int (rather than an Etype) value, represented by the local variable N. Each time Find calls itself recursively, it will add 1 to the value of that variable to reflect an examination of a left or right child of the current node. When the argument X is found in the tree, the current value of N is returned as its depth. The starting point for the descent is, of course, the root, where N is initialized at 0.

With this function in hand, you should add code to the Xref.cpp program that will compute and report the height of the final tree (that is, the maximum depth of all the nodes visited in a traversal), the optimal height of the tree if it were balanced (that's the floor of the log of its number of nodes), and a message about whether or not the tree is balanced.

Design a complete C++ program that uses these two classes and implements the tree building algorithm described in Part 1. The program should display, using the appropriate iterators, the last tree in the stack after it has been built, in preorder, inorder, and postorder form.

Lab 7 Deliverables:

Submit your revisions to the Bst.h and Xref.cpp files from Part 2 of this lab by dragging them to the Drop Box folder. Also hand in hard copy listings of your revisions, along with your answers to the questions in Part 1.

You should work on this lab individually, since it will be part of test #2; all programming and answers to the questions should be developed by you. I will be available for help during the week on Friday, Tuesday, and Wednesday afternoons from 3:30-5:00. I also will reply to e-mail questions over the weekend.