os

CSCI 2310
Operating Systems

Bowdoin College
Fall 2014
Instructor: Sean Barker

Project 1 - Inverted Index

This project will give you experience writing a simple C++ program using the STL.

For this assignment, you will write a program in C++ that generates an inverted index of all the words in a list of text files. The goal of this assignment is to ensure that you are sufficiently up to speed in C++ to handle the rest of the course.

You are welcome to use any Unix/Linux based machine to write your programs. Bowdoin has two public Linux systems with all necessary software preinstalled that you may use - dover.bowdoin.edu and foxcroft.bowdoin.edu. You can SSH into these machines from on-campus in order to write your programs. More information on the Bowdoin Linux environment is available here.

Inverter Input

Your inverter will take exactly one argument: a file that contains a list of filenames. Each filename will appear on a separate line. Each of the files described in the first file will contain text that you will build your index from.

For example:

inputs.txt
-----
foo1.txt
foo2.txt

foo1.txt
-----
this is a test. cool.

foo2.txt
-----
this is also a test.
boring.

Inverter Output

Your inverter should print all of the words from all of the inputs, in "alphabetical" order, followed by the document numbers in which they appear, in order. For example (note: your program must produce exactly this output):

a: 0 1
also: 1
boring: 1
cool: 0
is: 0 1
test: 0 1
this: 0 1

Alphabetical is defined as the order according to ASCII. So "The" and "the" are seperate words, and "The" comes first. Only certain words should be indexed. words are anything that is made up of only alpha characters, annd not numbers, spaces, etc. "Th3e" is two words, "Th" and "e".

Files are incrementally numbered, starting with 0. Only valid, openable files should be included in the count. (is_open comes in handy here).

Your program should absolutely not produce any other output. Extraneous output, or output formatted incorrectly (extra spaces etc.) will make the autograder mark your solution as incorrect. Please leave yourself extra days to work these problems out.

Implementation Hints

If you are rusty with C and/or C++, I first suggest you read the UMass Intro to C++ guide. It should be quite understandable if you are familiar with Java.

Implement the data structure using the C++ Standard Template Library (STL) as a map of sets, as in:

map<string, set<int> > invertedIndex;

Use C++ strings and file streams:

#include <string>
#include <fstream>

Make sure that your project uses an ifstream instead of an fstream. Both are included in the fstream library.

Remember, your program needs to be robust to errors. Files may be empty, etc. Please handle these cases gracefully and with no extra output.

The noskipws operator may be useful in parsing the input:

input >> noskipws >> c;

Handing Project In

Your project will be handed in using the autograding system. Please read the howto on using the autograder system.

Project Writeup

Unlike future projects, no writeup is needed for this project.