its

CSCI 2330
Introduction to Systems

Bowdoin College
Spring 2017
Instructor: Sean Barker

Project 2 - Get the Point(er)

This project may be completed either individually or in groups of 2. If you work in a group, you should be coding at the same time in front of one machine (this is sometimes called 'pair programming'). Take turns driving.

The main objective of this assignment is to make you familiar with the memory model used by the machine and C -- most importantly, the idea of pointers. To do this, you will implement a shell command parser in C. The shell is the program that reads and interprets the commands you type in a command-line environment. Your program will implement one part of the shell's functionality: splitting a command line into its meaningful parts.

The parser takes a command line (a single string, e.g., 'gcc -Wall file1.c file2.c') and converts it to a command array, which is an array of strings representing the command itself (e.g., 'gcc') and its space-separate arguments (e.g., '-Wall', 'file1.c', 'file2.c'). This process largely resembles the behavior of the split function available in many programming languages. As you can probably guess, you won't be allowed to just use a library function like split, but must instead implement this functionality yourself. Doing so will force you to confront how strings are implemented and manipulated in memory (and will also give you an appreciation for how much complexity higher-level programming languages are hiding from you)!

Start by reading through the entire project description!

Project Dates

AssignedMonday, February 20.
DueWednesday, March 1.

Project Overview

Your project files consist of the following:

Since your command parser is essentially a software library (that is, it simply provides functionality for use by other programs), your command.c file does not have a main function and cannot be executed directly. Instead, the actual executable is derived from command_test.c. Running make will build the test executable, which can then be executed with ./command_test.

Your main job will be to complete the parsing functions in command.c. In addition, you will write additional tests within command_test.c.

Command Line Structure

A command line is defined to be a null-terminated string containing zero or more words separated by one or more spaces, with an optional ampersand ('&') after the final word. This ampersand is used to denote a background command. In particular:

Note that the rules above are slightly more restrictive than in a real shell program (just for simplicity).

Command Line Examples

Here are two typical command lines:

Since the definitions allow for any spacing between words, the following lines are all allowed variations on the first example above:

Similarly, the following are all valid command lines:

However, the following are all examples of invalid command lines:

Command Arrays

The parser converts a command line into a command array, an array of strings representing the words of the command line, in order, terminated by a NULL element. Recall that a string in C is not a special type; it is just an array of char terminated by a null character* ('\0', the character with ASCII code zero). A command array is thus an array of pointers to arrays of characters, and thus has the type char**. All arrays in this structure must be null-terminated. Note that there are two different notions of null here -- the null character '\0' used to terminate a string, and the special value NULL representing a null pointer and used to terminate the words array.

IMPORTANT: '\0' is not the same as NULL. The former is the null character, and is one byte in size. The latter is the null address (aka the null pointer), and is one machine word in size.

Also remember the difference between literal characters and literal strings. Characters in C are specified in single quotes, e.g., 'a', and have type char. Literal strings are specified in double quotes, e.g., "a", and have type char*. As such, the expression 'a' == "a" evaluates to false. The first part of this expression is a one-byte char, while the second part is a pointer to the start of a one-character, null-terminated char array in memory.

Here is an example command array for the command line string "ls -l its-projects":

Command Array:             Null-Terminated Strings
Index   Contents           (stored elsewhere in memory)
      +----------+
    0 |  ptr  *----------> "ls"
      +----------+
    1 |  ptr  *----------> "-l"
      +----------+
    2 |  ptr  *----------> "its-projects"
      +----------+
    3 |   NULL   |
      +----------+

Here is the same array drawn another way and showing how the array is arranged in memory, with each element's offset from the base address of the array. Addresses grow left to right, and are assumed to be 64 bits (8 bytes), so indices are related to offsets by a factor of 8.

Index:        0       1       2       3 
Offset:   +0      +8      +16     +24     +32
          +-------+-------+-------+-------+
Contents: |   *   |   *   |   *   | NULL  |
          +---|---+---|---+---|---+-------+
              |       |       |
              V       V       V
              "ls"    "-l"    "its-projects"

Note that although we draw "strings" in the above pictures, this is an abstraction. Each string is actually represented by a '\0'-terminated array of 1-byte characters in memory, as shown below for the first word. Since each element is one byte, the addresses of adjacent characters differ by 1 and the offset is identical to the index.

Index:      0     1     2
Offset:  +0    +1    +2    +3
         +-----+-----+-----+
         | 'l' | 's' |'\0' |
         +-----+-----+-----+

Function Specifications

You must write four functions (plus any necessary helper functions) supporting command parsing in command.c according to the headers in command.h:

Coding Rules

You are not allowed to use array notation in this project. Instead, you should use idiomatic pointer style for working with your strings (see below). Practically speaking, this is just a notational difference, but will force you to think about your structures explicitly in terms of pointers.

Your code should follow standard memory safety rules:

Note that violating one of these rules does not necessarily mean that your program will crash (e.g., if you access uninitialized memory), so the absence of crashes does not necessarily mean that you are following all the rules. Use the valgrind tool to check for many kinds of memory errors.

In addition, you should abide by these usage conventions within your library:

Lastly, treat compiler warnings like errors. Your code should not emit any warnings (or errors, of course) when compiled, even if it still works.

Idiomatic Pointer Style

Your submitted code must use only pointers and pointer arithmetic, with no array notation. Choosing pointer arithmetic over array indexing is not necessarily the best choice for clear code in all cases, but will teach you about how arrays work at a lower level.

A simple way to think with arrays but write with pointers is to use *(a + i) wherever you think a[i]. However, this simple transformation rarely matches an idiomatic pointer style.

An idiomatic loop over an array with array indexing typically uses an integer index variable incremented on each iteration. Keep the scope of the index variable (or "loop variable") as small as possible for the task:

    // replaces all characters in string a by 'Z'
    for (int i = 0; a[i] != '\0'; i++) {
        a[i] = 'Z';
    }

An idiomatic loop over an array with pointer arithmetic typically uses a cursor pointer that is incremented to point to the next element on each iteration. Keep the scope of the cursor pointer variable as small as possible for the task:

    // replaces all characters in a by 'Z'
    for (char* p = a; *p != '\0'; p++) {
        *p = 'Z';
    }

Your final code should contain zero array[index] operations.

Suggested Workflow

Since there are several components of the library, here is a suggested plan of action for tackling them:

  1. Add several more hard-coded command array test cases to command_test.c.
  2. Implement and test command_show and command_print. These functions should only be a few lines of code each. Test them on the constant, statically allocated command arrays in command_test.c.
  3. Add several more hard-coded valid and invalid command lines to command_test.c to test many aspects of the specification.
  4. Implement and test command_parse in stages, testing each stage on several inputs and committing a working version before continuing:
    1. Count the number of words in line and detect use of &, returning NULL for invalid commands and marking the foreground/background status for valid commands.
    2. Allocate the top-level command array.
    3. For each each word in line, allocate properly sized space to hold the word as a null-terminated string, copy the word into this space, and save it in the command array.
  5. Implement and test command_free.

Tools and Debugging Tips

Programming in C can be finicky and error-prone, even for experts. Make use of the tools available to aid in debugging whenever possible:

Make liberal use of assertions. Assertions are "executable documentation": they document rules about how code should be used or how data should be structured, but they also make it easier to detect violations of these rules (a.k.a. bugs!). Use the assert(...) statment in C by including assert.h and asserting expected properties. For example, the provided code already includes code that asserts that the arguments to command_ functions are not NULL. Thus, if a NULL argument is ever passed to these functions, an error message will be printed and execution will halt immediately. Detecting errors early like this (vs. crashing or corrupting data later wherever the code depends on this assumption) saves a lot of time. Add assertions to make the "rules" of your code clear wherever you make assumptions.

Use Valgrind, which is an extended memory error checker. It helps catch pointer errors, misuse of malloc and free, and more. Run valgrind on your compiled program like this: valgrind ./command_test. Valgrind will run your program and observe its execution to detect errors at run time. Running under Valgrind when developing is always a good idea to catch memory errors as early as possible.

Use GDB (the GNU DeBugger) to help debug your programs when you need more information than Valgrind provides. When debugging programs with pointers, pay special attention to the pointer values your program generates. Inspect them like other variables or use the address-of (&) and dereference (*) operators at the gdb prompt to help explore program state.

Refer to the GDB Reference Sheet when debugging in GDB.

Avoid using print-based debugging (e.g., printf) if at all possible. If you do use printf, remember that you need to explicitly include ending newline characters (unlike, for example, System.out.println in Java or print in Python). Be sure to disable all extraneous print commands in command.c in your final submitted version.

C Function Declarations and Header Files

In C, a function is allowed to be used only after (that is, later in the file than) its declaration. This differs from Java, which allows you to refer to later methods. When declaring helper functions, you can do one of a few things to deal with this:

  1. Just declare your helper function before the functions that use it.
  2. Write a function header earlier in the file, then the actual definition later in the file. The function header just describes the name and type of the function, much like an interface method in Java. For example:
            // A function header declares that such a function exists,
            // and will be implemented elsewhere.
            int helper(int x, int y);
    
            // Parameter names are optional in headers.
            int helper2(char*);
    
            void needsHelp() {
                // OK, because header precedes this point in file.
                helper(7, 8);
                helper2("hello");
            }
    
            int helper(int x, int y) {
                return x + y;
            }
    
            int helper2(char* str) {
                return 7;
            }
  3. If the functions would likely get used elsewhere, then put the header in a header file, which is a file ending in '.h' that contains only function headers (for related functions) and data type declarations. For example, if you added another general function (not just a helper function) for manipulating commands, it would be best to place a function header for it with the other function headers in command.h so that users of your command library can call it.

Header files are included (essentially programmatically copy-pasted) by the #include directive you often see at the tops of C source files.

Logistics

The project files have been added to your SVN directory for you -- to download them, simply do an update in your checked-out SVN directory.

Remember that you should be working on turing, not on your local machine. While it is possible that things work correctly on your local machine, if there are problems, I will not be able to effectively help you!

Before submitting, disable any diagnostic printing in command.c. Only command_print and command_show should print, as specified.

If working in a group, make sure both your names are in your command.c file. No need to submit two copies -- as long as one of you has submitted the final version to your SVN repository (and both names are on it), you will get credit for it.

Evaluation

Your project will be evaluated as follows:

Your library will be tested using a private suite of test inputs in addition to the test inputs provided with the starter code. You should extend command_test.c with your own large suite of test inputs and run under valgrind to help check that your code meets the specification and is free of memory safety violations. Your tests themselves will not be graded, but preparing and using them will help you ensure that your code is fully correct and efficient.

Acknowledgement

Thanks to Ben Wood for sharing the original version of this project!