Computer Science 101A Lab 6: Strings, Files and Arrays - Finding a Readability Index
Due: March 26, 2002

Objectives and Overview: The purpose of this lab is to gain experience with strings, text files, and simple arrays by designing a program that computes a readability index and a word list.  Before starting this project, the following files should be dragged from the CS105 (Tucker) folder into a fresh copy of the MyProject folder on the desktop: poem, tiny, literacy , and readability.java.

Part 1.  Exercise the Readability Program

A "readability index" is a statistical estimate of the grade level at which a text can be read.  This index can be computed in various ways, including the so-called Flesch-Kincaid index and the Fog index .  Such indexes are usually based on such measures as the average number of words in a sentence, the average word length, and the number of "difficult" words in the text.

The program readability.java is designed to take a text from an input file and then compute the readability index for that text.  It does this computation by counting the number of sentences and the number of words in the text, and then basing the grade level for the text on the average number of words per sentence.  This program appears below:
class readability {
  public static void main (String param[]) throws IOException {
  // Establish input text file
    System.out.println("Type file name");
    String f = Keyboard.readString();
    int sentences = 0, words = 0;    
    // open the file and get the first line of text
    BufferedReader in = new BufferedReader(new FileReader(f));
    String line = in.readLine();
  // Read a series of lines of text   
    while(line != null) {
        StringTokenizer t = new StringTokenizer(line);
    // Divide the line into individual words
        while (t.hasMoreTokens()) {
          String w = t.nextToken();
          char c = w.charAt(w.length()-1);
          if (c=='.'|| c=='?' || c=='!') {
             sentences = sentences+1;
             w = w.substring(0,w.length()-1); 
          }
          words = words + 1;
        }
    // read the next line of text
        line = in.readLine();
      }
  // Compute average sentence length and display results
    System.out.print("Results: ");
    System.out.println( words + " words and " + sentences + " sentences.");
    int length = words / sentences;
    if (length <= 8) System.out.print ("4th grade");
    else if (length <= 11) System.out.print ("5th grade");
    else if (length <= 14) System.out.print ("6th grade");
    else if (length <= 17) System.out.print ("7th grade");
    else if (length <= 24) System.out.print ("High school");
    else System.out.print ("College");
    System.out.println (" level: " + length + " words per sentence");
  }
}

Strings

The Java class String can support programming in a wide range of applications, such as word processors and spelling checkers.  Below is a summary of some String methods and their use. More discussion of Strings appears on pages 74-78 of your text.
String Constants and variables
A String constant is any sequence of characters enclosed in single double quotes ("). E.g., the following message is a String constant.

"Type file name:  "

A String variable is any variable declared with the type String. The following are String variable declarations:

String f, line;

The value of a String variable can be any String constant, including the empty String "" (the one that has no characters between the quotes).

The String class has a number of methods that allow programs to manipulate String values in different ways. The discussion below illustrates how some of these methods are used by showing some simple examples.

String Length, Assignment, Concatenation, and Comparison
The length of a String is the number of characters that appear between the quotes, including blanks and all other special characters.  String variables can be assigned values, just like numerical variables. This can be done with an assignment statement or a in.readname() statement. The following statement declares and assigns a value to the String variable s:

String s = "Hello World!";

The length of s is 12, and the method length returns the length of any String. Thus, the expression s.length() returns 12 in this example.

The ith character of a String variable can be isolated by using the charAt method.  String values can be viewed as arrays of characters in this sense, so the expression s.charAt(0) returns the first character 'H', s.charAt(1) returns the second character 'e', and so forth. [Note that single characters are enclosed in single quotes (') rather than double quotes ("), which enclose strings. Thus the character H is written as 'H', while the 1-character String H is written as "H".]

Two String values can be concatenated (joined together) to form a single String, either using the + sign or using the concat method, which is part of the String class. Either of the following statements leaves the variable u with the value "Hello World! Hello!".

u = s + " Hello!";
u = s.concat(" Hello!");

Two String values can be compared for equality or nonequality using the equals method. Equality for String values means that one is an identical copy of the other, letter for letter. So we might write

if (s.equals("Hello!")) ...

to test whether or not the current value of s is identical to the String "Hello!".   A related method is called equalsIgnoreCase , which will be true if the two strings are identical except for capitalization differences.  For instance, if the value of  s is "Hello", then s.equals("heLLO") is false but s.equalsIgnoreCase("heLLO") is true.

To test whether one string precedes another in an alphabetical sense, the compareTo method can be used.  The expression s.compareTo("hello") returns a number < 0, 0, or > 0 depending on whether the value of s precedes, matches, or follows the string "Hello" alphabetically.  So, considering an array 'words' of strings, the following if statement would test to see if string words[i] follows words[j] in the dictionary, and if so swap them:

if (words[i].compareTo(words[j]) > 0))
   swap(words, i, j);

This example assumes, of course, that we have a method swap that will exchange the values of two entries in an array of strings.

Substring Extraction, Insertion, and Deletion
A String can be formed by extracting a copy of an embedded String, called a substring, from another string. For example, the following statement extracts a copy of "World" from the String s and assigns it to the String t::

t = s.substring(6, 11);

Here, the 6 designates the starting position of the substring to be copied, and the 11 designates the position of the next character in s following the substring (counting the first character at position 0, not 1). Note that the value of s itself is not affected by this action.

If we wanted to delete that same substring from s, we would need to concatenate the beginning part of s with that part which follows that substring. That is, we would write

s = s.substring(0, 5) + s.substring(11);

to obtain the string "Hello" and assign it to s. Note here that the expression s.substring(11) takes that substring which begins at position 11 and ends at the end of s. That is, it is the entire righthand end of s.

Finally, to insert a new substring within an existing String, we use concatenation and substring in a different way.  For instance, to insert the word "Cruel" inside the string s = "Hello World!" to form s = "Hello Cruel World!" we would write the following assignment:

s = s.substring(0, 5) + " Cruel " + s.substring(6);

String Searching and Replacement
Two other useful String methods search a String for the position of a particular character or substring that may occur within it, and replace a character within a String by another character. The first is called indexOf, and the second is called replace. Here are two examples (assuming i and j are integer variables, and s has the value "Hello Cruel World!").

i = s.indexOf("Cruel");
s = s.replace(s.charAt(i), 'c');

The resulting value of i in this example is 6 (the position of the first character of the leftmost occurrence of the substring "Cruel" within s). The resulting value of s is "Hello cruel World!" since s.charAt(i) returns the character 'C'.  Here are some simple exercises to check your understanding of the above ideas.

  1. Write a Java statement that locates the position of the first blank character in String u and assigns it to the integer variable i.
  2. Using this value of i, write another statement that extracts the first word from u and assigns it to t. That is, if u = "Hello Hello Cruel World" , your statement should leave t = "Hello" and u = "Hello Cruel World".
  3. Design a loop that one-by-one extracts single characters from the String u and displays them in a vertical list on the screen.
  4. Design a method displayChars(String s) that solves the previous problem.
  5. Design another loop that extracts all blanks from a String s , leaving only its nonblank characters.  For instance, if s = "Hello Cruel World", then the result of this loop should be "HelloCruelWorld" .
  6. Design a method removeBlanks(String s) that solves the previous problem by returning a string with all its blanks removed.
  7. Design a method reverse(String s) that returns a string that is the reverse of s.  For instance, the call reverse("Hello") should return the string "olleH".

Text Files

Taking input from a text file is similar to taking input from the keyboard, except that the user doesn't need to type the input every time the program is run.  This program uses two of the standard Java classes to facilitate this - the BufferedReader class and the StringTokenizer class.  The BufferedReader class is summarized on pages 395-398 and 629 of our text. It allows the program to input a text, line by line, from a file.  The StringTokenizer class, summarized on pages 206-210 and 743 of your text, allows an individual word in a line of text to be isolated and assigned to a separate String variable.

To access a file, the program should first declare a new variable for the file as follows:

BufferedReader in = new BufferedReader(new FileReader("<filename> "));

Here, <filename> denotes the name of the file which will supply the text input.  Then whenever it needs to read, say, the next word in the text file and save it in the String variable word , it says:

String line = in.readLine();

When there are no more lines of text to be read from the file, the readString method will return the special constant null.  This condition can therefore be tested directly, using the expression line==null .

To separate individual words from a line of text, a new StringTokenizer variable should be declared and initialized with the line of text.

StringTokenizer t = new StringTokenizer(line);
Now, a "token" is just a series of nonblank characters in a string that is separated from the next token by one or more blanks.  So a token is almost the same idea as a word.  The special methods hasMoreTokens and nextToken are used to test if line has one or more tokens remaining, and to extract the leftmost remaining token from the string.  For instance, the expression line.hasMoreTokens() tests the former situation and the statement w = line.nextToken() accomplishes the second.

Returning to the readability program shown above, exercise this program to see how it works. That is, add it to your project and then run it once for each of the input text files poem, tiny, and literacy.

  1. What is the input for each of these runs?  What is the output?
  2. Of what classes is the readability program a client?
  3. What is the role of each of the method calls charAt, length, and substring in this program?  What class is supplying the definitions of these methods for use by the readability program.
  4. For each repetition of the inner while loop in this program, what does the String variable w contain (in general)?

Arrays

An array is a single variable name that is associated with a series of values, rather than just one. An array is identified in its declaration by the presence of square brackets [], as in the following:

int [] A = new int[5];
String [] words = new String [10];

The first declaration defines A as an array of 5 int values, while the second defines words as an array of 10 Strings.

Individual entries in an array can be assigned values by specifying their integer index.  By convention, the index of the first entry in an array is 0, the second is 1, and so on.  For example, the following statements assign values -1, -2, and -3 to the first three entries of A.

A[0] = -1;
A[1] = -2;
A[2] = -3;

A reference to a value in an array entry is also specified using an index inside square brackets.  For example, the following loop displays each of the values -1, -2, and -3 in a column.

for (int i=0; i<3; i++)
   System.out.println(A[i]);

Notice here that the reference A[i] identifies a different entry in A each time this loop is repeated.

Array references can also be used in expressions, such as 2*A[i]+1 , and method calls, such as Math.pow(A[i], 2).  For example, the following loop leaves the variable sum with the sum of the squares of the first three entries in A, or 14.

int sum = 0;
for (int i=0; i<3; i++)

   sum = sum + Math.pow(A[i], 2);

Arrays are more fully discussed and illustrated on pages 268-286 of your text.  

Part 2.  Revise this Program to Develop and Display a Vocabulary List

A vocabulary list for a text is just a list of all the different words that appear in it.  For instance, the text tiny has the following contents:

abba   dabba     dabba dabba
dabba dabba     dabba
said  the   monkey
      to    the    chimp.

A vocabulary list for this text has one occurrence of every different word that appears there:

abba
dabba
said
the
monkey
to
chimp

To solve this problem requires the use of a text file for input and an array of Strings to assist the computation of a word list.  .

Lab 6 Deliverables

By 4pm on the due date, turn in a printed listing of your completed program from Part 2, along with your answers to the questions in Part 1.  Also, submit an electronic copy of your completed program from Part 2 as lab5 yourname.java to the Drop Box in the CS105 (Tucker) folder.  You may work in teams of two on this project.  If so, both team members' names should appear on all work completed jointly.