Lab 6 - The Bowdoin Shell

Assigned:	Tuesday, November 26.
Due Date:	Wednesday, December 11.
Collaboration Policy:	Level 1
Group Policy:	Pair-optional (you may work in a group of 2 if you wish)

In this lab, you will write a simple shell program that supports Unix-style job control. Doing so will help you understand the core concepts of process control and will also touch on the challenges of concurrent programming. Writing your shell will also give you an introduction to level systems programming and communicating with the operating system.

Lab Overview

A shell is an interactive command-line interpreter that runs programs on behalf of the user. At a high level, a shell repeatedly prints a prompt, waits for a program name and command-line arguments on stdin (i.e., the terminal window), then carries out some action as directed by the input.

The shell program that you have been using all semester is bash (the Bourne Again Shell). Bash is only one of many shell programs, however - others include sh, tcsh, and csh. In this lab, you will implement your own shell: bsh (the Bowdoin Shell).

Unix Shell Basics

As we saw in Lab 2, a command-line string is a sequence of text words delimited by whitespace. The first word of the string is either the pathname of an executable file (i.e., a program) or a built-in command. The remaining words are command-line arguments. If the first word is a built-in command, the shell immediately executes the command within the current shell process. Otherwise, the shell forks a child process, then executes the specified program in the context of the child. The set of all child processes created as a result of interpreting a single command (there may be mulitple, if the program itself forks) are known as a job. A job can also contain multiple child processes connected by Unix pipes (denoted in a command by vertical bars, |), which allow for passing output from one program as input into another program (although your shell will not need to support pipes).

By default, a job runs in the foreground, which means that the shell waits for the job to terminate before prompting for the next command string. Thus, at any point in time, at most one job can be running in the foreground. However, if the command string ends with an ampersand (&), then the job runs in the background. A background job means that the shell does not wait for the job to terminate, but instead immediately prints another prompt and allows for another command string. As a result, an arbitrary number of background jobs can be running at a given time, in addition to at most one foreground job.

Typing the following command runs the program ls (located in the directory /bin) in the foreground with command line arguments -l -d:

bsh> /bin/ls -l -d

Note that more specifically, calling the above will execute the main function of /bin/ls with the following values of argc and argv:

argc is 3
argv[0] is '/bin/ls'
argv[1] is '-l'
argv[2] is '-d'

Alternately, typing the same command with an ampersand will run ls in the background:

bsh> /bin/ls -l -d &

Normally, the shell allows you to just specify the command name without the enclosing directory (e.g., ls instead of /bin/ls) by automatically searching for the specified program within a list of known directories. This list of directories, called the PATH, normally includes /bin and several other system directories. However, since your shell will not support a PATH, you will need to specify the complete directory containing any program you wish to run. From a regular shell, you can locate any given program using the which program, e.g., which pwd, which will tell you how to specify the program from within bsh.

Job Control

Unix shells support the notion of job control, which allows users to move jobs back and forth between background and foreground, and to change the process state (running, stopped, or terminated) of all the processes in a job. Job states can be changed via signals: typing Ctrl-Z causes a SIGTSTP signal to be delivered to every process in the foreground job. The default action for SIGTSTP is to place the process in the stopped state, where it remains until it is awakened by the receipt of a SIGCONT signal. Typing Ctrl-C causes a SIGINT signal to be delivered to each process in the foreground job. The default action for SIGINT is to terminate the process.

Unix shells also provide various built-in commands that support job control. Key commands are listed below:

jobs: List the running and stopped background jobs.
bg <job>: Change a stopped background job to a running background job.
fg <job>: Change a stopped or running background job to a running job in the foreground.
kill <job>: Terminate a job (more specifically, sends a SIGTERM signal to the job, for which the default behavior is to terminate the process).

The `bsh` Specification

The bsh shell should have the following features:

The prompt should be the string "bsh> ".
As described previously, the command string typed by the user should be either a built-in command or a program name, possibly following by command-line arguments. Programs should be executed in the context of a child process forked by the shell.
Typing Ctrl-C should cause a SIGINT signal to be sent to the current foreground job (i.e., the initial child that was forked for that job as well as any descendant processes of that child). Typing Ctrl-Z should work the same except that the signal sent is SIGTSTP.
If the command string ends with &, the job should be run in the background. Otherwise, it should run in the foreground.
Each job can be identified either by a process ID (PID) or by a job ID (JID). PIDs are positive integers assigned by the operating system when processes are created. JIDs are positive integers assigned by bsh to each job. On the command line, a JID is denoted by the prefix '%'. For instance, %5 denotes JID 5, while 5 denotes PID 5.
bsh should support the following built-in commands:
- quit: Terminate the shell.
- jobs: List all background jobs.
- bg <job>: Restarts <job> by sending it a SIGCONT signal, then runs it in the background. The job argument can be either a PID or a JID.
- fg <job>: Restarts <job> by sending it a SIGCONT signal, then runs it in the foreground. The job argument can be either a PID or a JID.
You do not need to support pipes (|) or I/O redirection (< and >) in your shell.
Your shell should not use the sleep system call (one particular case which may tempt you to use sleep is described in the advice section).

Code Structure

To start, you have been provided with a functional skeleton of the shell. The starting code implements a number of less interesting functions (such as command line parsing and error reporting) that you should use while implementing the complete shell, allowing you to focus on the more interesting components.

The only file you should modify is bsh.c, which contains several complete supporting functions as well as various function skeletons that you will need to complete. You do not need to define any functions beyond those already specified in bsh.c, but you are welcome to do so if you wish.

A summary of the functions that you must implement is given below:

eval: Main routine that parses and interprets the command line.
builtin_cmd: Recognizes and interprets the built-in commands listed above (bg and fg commands should result in calling do_bgfg as below).
do_bgfg: Implements the bg and fg built-in commands.
waitfg: Waits for a foreground job to complete.
sigchld_handler: Handler for SIGCHILD signals.
sigint_handler: Handler for SIGINT (Ctrl-C) signals.
sigtstp_handler: Handler for SIGTSTP (Ctrl-Z) signals.

The included Makefile will compile the shell for you. To run your shell, simply execute it:

unix$ ./bsh
bsh> [type commands to your shell here]

Included Files

You have also been provided with a number of tools to help you check your work. All included files are described below:

bsh.c: The code of your shell.
bshref: The reference shell. Run this program if you have any questions about how your shell should behave. Your shell should emit identical output to the reference solution (with a few caveats, noted later).
sdriver.pl: A shell driver program that executes the shell and feeds it commands and signals as directed by a trace file, then captures and displays the output from the shell.
trace{01-16}.txt: 16 trace files that you will use in conjunction with the shell driver to test the correctness of your shell. The lower-numbered trace files do very simple tests, while the higher-numbered tests do more complicated tests.
bshref.out: The output of the reference solution on all traces, for your reference. This might be more convenient than manually running the shell driver on all trace files.
myspin.c: A test program that sleeps for a specified number of seconds.
mysplit.c: A test program that forks a child, which then sleeps for a specified number of seconds.
mystop.c: A test program that sleeps for a specified number of seconds, then sends a SIGTSTP signal to itself.
myint.c: A test program that sleeps for a specified number of seconds, then sends a SIGINT signal to itself.
Makefile: Builds the shell and all test programs. Also provides useful targets for testing the shell (see below).

Use the -h flag to see the usage string for sdriver.pl:

unix$ ./sdriver.pl -h
Usage: ./sdriver.pl [-hv] -t <trace> -s <shellprog> -a <args>
Options:
  -h            Print this message
  -v            Be more verbose
  -t <trace>    Trace file
  -s <shell>    Shell program to test
  -a <args>     Shell arguments

For example, you could run the shell driver on trace01.txt by typing the following:

unix$ ./sdriver.pl -t trace01.txt -s ./bsh

Similarly, you could run the trace driver on the reference shell by simply substituting bsh with bshref in the command above.

More simply, you can use the included Makefile to run the driver on the trace files. To pass trace01.txt through your shell, you can just run:

unix$ make test01

Similarly, to pass trace01.txt through the reference shell, you can run:

unix$ make rtest01

The output of your shell from the trace files is exactly the same as the output you would have gotten from running your shell interactively, except for an initial comment that identifies the trace.

Output Formatting and Logging

Since your shell output should exactly match that of the reference shell, your output messages should contain the same information in the same format as the reference shell. Particular messages that you should look out for include the following:

Running an invalid program ./prog:
```
./prog: Command not found
```
Running a new background job with jid 20 and pid 500 executing /bin/ls -l:
```
[20] (500) /bin/ls -l
```
Switching an existing job with jid 20 and pid 500 executing /bin/ls -l to the background:
```
[20] (500) /bin/ls -l
```

Running bg without specifying a job:

bg command requires PID or %jobid argument

Running fg without specifying a job:

fg command requires PID or %jobid argument

Specifying an invalid pid 500 to bg or fg:
```
(500): No such process
```
Specifying an invalid jid 20 to bg or fg:
```
%20: No such job
```
Specifying something other than a pid or jid for bg:
```
bg: argument must be a PID or %jobid
```
Specifying something other than a pid or jid for fg:
```
fg: argument must be a PID or %jobid
```
Job with jid 20 and pid 500 terminated by signal 15 (use WTERMSIG to find the signal number from the child status):
```
Job [20] (500) terminated by signal 15
```
Job with jid 20 and pid 500 stopped by signal 20 (use WSTOPSIG to find the signal number from the child status):
```
Job [20] (500) stopped by signal 20
```

Note that the above messages should always be printed. You are welcome to add additional output when running in verbose mode, but your verbose output does not need to match that of the reference shell. Refer to the reference shell if you are unsure about any of the exact formatting of these messages. In particular, trace 14 exercises many of the error messages.

Trace File Format

Each trace file consist of a series of commands to test the functionality of your shell. The trace files are understood by the sdriver.pl driver program, which launches your shell, then executes a given trace file against the running shell process, capturing its output. In order to understand what the tests are doing, you should also be sure to understand the trace files. The format of each trace file is summarized below:

Each non-empty line of a trace file consists of either a comment, a driver command, or a regular command string.
Comment lines start with # and are ignored by the driver.
Driver commands are specified in ALL_CAPS and cause some external event to occur that interacts with the shell. The driver commands used in the trace files are summarized below:
- INT: Send a SIGINT signal to the shell, which is equivalent to typing Ctrl-C at the prompt.
- TSTP: Send a SIGTSTP signal to the shell, which is equivalent to typing Ctrl-Z at the prompt.
- CLOSE: Notify the shell that there is no more input to read, which is equivalent to typing Ctrl-D at the prompt. The provided starter code responds to the end of input by calling exit, so this command allows a trace to gracefully terminate the shell even if you haven't yet implemented the built-in quit command.
- SLEEP n: Pause for n seconds.
- WAIT: Wait for the shell process to terminate.
All other lines are regular command strings, which specify programs to be run via the shell process. These command strings are passed verbatim to the bsh process. Note that in many of the traces, programs are prefaced by a separate line that runs the echo program, e.g.:
```
/bin/echo -e ./myprog 10
./myprog 10
```
This pattern is simply a way to print out the command that the shell is about to execute before executing it (since all the echo program does is print a message). The first line above will just run the echo program to print out the real command, while the second line will actually run myprog (and presumably do something more interesting than echo).
Once the driver has issued every command specified in the trace, the shell is notified that there is no more input to read, which should normally cause the shell process to terminate. This is basically equivalent to an implicit CLOSE command at the end of every trace.

Traces 11-13 and `ps`

Traces 11, 12, and 13 use the ps program, which lists active process information (from the OS, not from your shell). The ps program will show a variety of process info (one process per line) but the column of particular interest is the STAT (state) column. A number of different state codes may be displayed in this column, but the only ones you really need to pay attention to are the following three:

R: the process is currently running
T: the process is currently stopped
Z: the process is an unreaped zombie

Make sure that these process state codes match between your shell and the reference shell. Note that the output of ps will include both processes created by your shell as well as other (potentially unrelated) processes created by you or other users. The only processes you should concern yourself with are the shell-created processes.

General and Function-Specific Advice

Here are some useful tips for working on your shell:

Chapter 8 of the textbook (Exceptional Control Flow) is a very useful reference for this lab, particularly sections 8.4 and 8.5.
Use the trace files to guide the development of your shell. Start with trace01.txt and make sure that your shell produces output that is identical to that of the reference shell. Then move onto trace02.txt, and so forth.
Especially at first, don't exclusively rely on the trace files when testing your shell. Remember that shells are interactive programs by their nature, and you will likely be able to spot many errors more easily by running the shell directly and testing it manually (as opposed to just executing the trace files against the shell).
Full-window programs like more, less, nano, vi, and emacs do strange things with the terminal settings. Don't run these programs from your shell; instead, stick with simple text-based programs like /bin/ls, /bin/ps, /bin/pwd, and /bin/echo (as well as the various test programs provided to you).
Make note of predefined helper functions in bsh.c that you may wish to use – e.g., unix_error and safe_print.
Useful system calls that you might want to use include fork, execve, getpid, waitpid, kill, setpgid, sigprocmask, and sigsuspend. In addition to full details of these calls available in the manpages, you can refer to the waitpid options and status macros detailed in the class slides.
Make friends with Linux manpages when working with system calls. You can get a detailed reference on any system call right from the terminal using man, e.g., by running man fork.

Tips for specific parts of the shell are given below.

Eval

You should use the provided parseline function to perform the heavy lifting of parsing the words of the command line. Note that parseline takes the command line (a regular string) as well as an array of pointers (argv) which will be filled in with pointers to the individual words of the command line. The pointer array isn't expected to be pre-initialized; it's just a bunch of storage which parseline will fill in with pointers to the actual words (located within the command line). Also note that since the pointers in argv will point to existing memory within the command line, you don't need to malloc any additional memory here. In fact, you don't need to use malloc anywhere in the entire lab, as all necessary data structures are pre-allocated as global variables.
Following parsing, you should pass the constructed argv array to execve in order to launch the target program. For the third parameter to execve, pass the predefined global variable environ (which is the set of "environment variables" defined in the current shell session).
When adding a new job to the job list, you must be careful to ensure that the job list is not corrupted. In particular, a nasty bug can occur when the shell forks a child, but before the parent actually adds the child to the job list by calling addjob, the child exits and is reaped by sigchld_handler. Think carefully about what would happen to the job list in this situation. This type of bug is called a race condition, as it depends on two processes "racing" during concurrent execution, and is difficult to debug as it occurs nondeterministically! To protect the job list, you'll want to prevent your signal handlers from running until the new child job is actually added to the job list.
Also related to the above issue, note that children inherit a copy of the signal mask of their parents, and therefore the child must be sure to unblock its signals before it executes the new program.
When you launch bsh from the regular Unix shell (i.e., the bash process), your bsh shell is running in the foreground process group of Bash. If your shell then creates a child process, that child will also (by default) be a member of Bash's foreground process group. Since typing Ctrl-C sends a SIGINT to every process in Bash's foreground group, doing so will send a SIGINT to your shell (which is good), but also to every process that your shell created (which is bad). You only want to send the SIGINT to your shell, which will then pass it along to your own foreground process group (if one exists). To handle this issue, after calling fork but before the child calls execve, the child should call setpgid(0, 0), which puts the child in a new process group whose group ID is equal to the child's PID. This ensures that there will be only one process (your shell) in Bash's foreground process group.

Signal Handlers

When you implement your signal handlers, be sure to send SIGINT and SIGTSTP signals to the entire foreground process group, using "-pid" instead of "pid" in the argument to the kill function (a negative PID other than -1 sends the signal to an entire process group). Note that sdriver.pl tests for this error.
Due to the effective concurrency of signal handlers with the rest of the program code, be careful about what code you put in signal handler functions. For example, you should not use printf in signal handlers (since weird things may happen if the signal handler is called when the program is already in the middle of executing printf). You can use the provided safe_printf function as a drop-in "safe" printf within handlers.
One of the tricky parts of the shell design is deciding on the allocation of work between waitfg and sigchld_handler - in particular, deciding where to reap child processes. A recommended approach is to perform reaping (via waitpid) entirely within sigchld_handler, and have waitfg simply pause until the specified pid is no longer in the foreground before returning. While other approaches are possible, it is simpler to do all reaping in the handler.
If you follow this suggested approach, you will need to think carefully about how to write waitfg. The easiest option is to use sleep inside a loop to periodically check that the process is still in the foreground, but this pattern is called busy-waiting and should be avoided (as it wastes CPU time and will likely wait longer than needed). Instead, you should use the sigsuspend function as a mechanism to block until a signal is received and processed (at which point you can check if the process is still in the foreground).
In general, system calls always return -1 and set the global errno variable if an error occurs (which you can easily access using the unix_error function). However, there are a few special cases to be aware of. One is that if waitpid has no remaining children to wait on, then it will return -1 and set errno to the value ECHILD. Importantly, this is *not* an actual error, despite waitpid returning -1. Assuming you are checking your return values for errors (which you should!), you may need to filter out this condition. A second similar case is sigsuspend, which always returns -1.

Using GDB in a Multi-Process Program

To debug a multi-process program such as your shell using gdb, you will need a few extra commands:

(gdb) set detach-on-fork on/off

The above command sets whether child processes will be detached when fork is called. The default is on (i.e., the child runs without any interruption). If you turn this option off, the child is suspended as soon as it is forked. Then, you can use the inferior command to switch between the various processes started by the shell:

(gdb) info inferiors
... listing of processes ...
(gdb) inferior 1
[Switching to inferior 1 [process 0] (<noexec>)]

Another useful option is the following:

(gdb) set follow-fork-mode parent/child

The above sets which process gdb will automatically follow (either the parent or the child -- parent is the default) after fork is called.

Logistics

You are responsible for completing the contents of the bsh.c file. You should not modify any other file. You are responsible for ensuring that your program runs on the class server, regardless of where else you may be writing code. Since other systems may have the same system calls but with slightly different behavior, you are strongly urged to develop your code entirely on the class server.

As usual, your final submission will consist of your committed bsh.c file at the time of the due date. Each group need only make one submission.

Evaluation

Your simulator will be graded on program correctness (as determined by the 16 trace files), design, and style. Your shell will be tested on the class server, where it should product identical output to the reference shell on the trace files, with two exceptions:

The PIDs can (and will) be different.
The output of the /bin/ps commands in traces 11, 12, and 13 will be different from run to run (since the ps program displays PIDs, among other things). However, the running states of any mysplit processes in the ps output should be identical.

You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions about what constitutes good program design and/or style that are not covered by the guide.

Other specific things to watch out for:

Your program should compile without any warnings on the class server.
You should check the return value of every system call.
You should not use the sleep system call.