Lab 6 - The Bowdoin Shell

Release Date:	Monday, April 29.
Acceptance Deadline:	Wednesday, May 1, 11:59 pm.
Due Date:	Wednesday, May 8, 11:59 pm.
Collaboration Policy:	Level 1
Group Policy:	Pair-optional (you may work in a group of 2 if you wish)

In this lab, you will write a shell program that supports Unix-style job control. Doing so will help you understand the core concepts of process control and will also touch on the challenges of concurrent programming. You will also gain experience working with system calls and signals to interact with the operating system.

Lab Overview

A shell is an interactive command-line interpreter that runs programs on behalf of the user. At a high level, a shell repeatedly prints a prompt, waits for line of input to be entered, then carries out some action as directed by the input (e.g., launching a program with the specified command-line arguments).

The shell program that you have been using all semester is bash (the Bourne Again Shell). Bash is only one of many shell programs, however; others include sh, tcsh, zsh, and csh. In this lab, you will implement your own shell: bsh (the Bowdoin Shell). Your shell will support launching and managing multiple jobs in many of the same ways that real shells do.

Unix Shell Basics

As we saw in Lab 2, a command-line string is a sequence of text words delimited by whitespace. The first word of the string is either the filename of a program (e.g., ls or ./myprogram or /full/path/to/myprog) or a built-in shell command (e.g., jobs). The remaining words are command-line arguments. If the first word is a built-in command, the shell immediately executes the command within the current shell process. Otherwise, the shell forks a child process, then executes the specified program in the context of the child. When a new program is executed, we refer to the new process as a job. Note that if the new process itself forks, then any new processes it creates will be considered part of the same job (so a job may ultimately contain more than one process).

By default, a job runs in the foreground, which means that the shell waits for the job to terminate before prompting for the next command string. Thus, at any point in time, at most one job can be running in the foreground. However, if the command string ends with an ampersand (&), then the job runs in the background. A background job means that the shell does not wait for the job to terminate, but instead immediately prints another prompt and allows for another command string. As a result, an arbitrary number of background jobs can be running at a given time, in addition to at most one foreground job.

For example, suppose we want to run the ls program with the two command-line arguments -l and -d. To specify the program name, we can either specify the full path to the program (which is /bin/ls here) or we can just type ls and let the shell locate the program in known program directories (which typically includes the /bin directory). The set of directories to search is called the PATH; for the purpose of this assignment, we will assume that the PATH is just the /bin directory. We can run the desired job in the foreground by typing the following command, assuming the shell prompt is bsh>:

bsh> ls -l -d

Specifically, entering the above will execute the main function of the /bin/ls program with the following values of argc and argv:

argc is 3
argv[0] is '/bin/ls'
argv[1] is '-l'
argv[2] is '-d'

Alternately, typing the same command with an ampersand will run ls in the background:

bsh> ls -l -d &

Job Control

Unix shells support the notion of job control, which allows users to manage the job state of each active job. In particular, active jobs can be in one of three states:

Foreground: The job is running in the foreground (i.e., as an interactive job). At most one job may be in this state at any given time.
Background: The job is running in the background. Any number of jobs may be in this state.
Stopped: The job is stopped (aka suspended). Stopped jobs do not continue executing until they are moved back into the foreground or background state.

Job states can be changed using signals, which can be triggered either by keyboard commands or by invoking various built-in shell commands that support job control. In particular, we are interested in two specific keyboard commands:

Typing Ctrl-C causes a SIGINT signal to be delivered to each process in the foreground job. The default action for SIGINT is to terminate each receiving process (though this behavior can be overridden through a signal handler).
Typing Ctrl-Z causes a SIGTSTP signal to be delivered to every process in the foreground job. The default action for SIGTSTP is to place the receiving process in the stopped state, where it remains until awakened by the receipt of a SIGCONT signal.

We are further interested in the following three built-in shell commands involved in viewing and managing jobs:

jobs: Lists all running and stopped background jobs. Note that the jobs command (or any other command) cannot be invoked if there is an active foreground job; hence, jobs will never list a foreground job.
bg <job>: Sends a stopped background job a SIGCONT signal to resume it, then continues running the job in the background.
fg <job>: Sends a stopped or running background job a SIGCONT signal to resume it (if a stopped job), then continues running the job in the foreground.

Jobs can be specified to fg and bg either using a process ID (PID) or a job ID (JID). PIDs are positive integers assigned by the operating system when processes are created. JIDs are positive integers assigned by the shell itself to each job. By convention, a JID is denoted by the prefix % to distinguish it from a PID. For example, fg %5 says to run the job with JID 5 in the foreground, while fg 5 says to run the job with PID 5 in the foreground.

To summarize, the following active job state transitions are possible through the actions indicated:

Foreground to Stopped: typing Ctrl-Z
Stopped to Foreground: the fg command
Stopped to Background: the bg command
Background to Foreground: the fg command

Code Overview

To start, you have been provided with a functional skeleton of the shell. The starting code implements a number of less interesting functions (such as command line parsing and utility methods for manipulating the job list) that you should use while implementing the complete shell, allowing you to focus on the more interesting components.

The only file you should modify is bsh.c, which contains skeletons of every function that you need to complete. You do not need to define any functions beyond those already specified in bsh.c, but you are welcome to do so if you wish.

A summary of the functions that you must implement is given below.

eval: Primary routine that parses and interprets the command line.
do_bgfg: Implements the bg and fg built-in commands.
waitfg: Waits for a foreground job to complete (should be called from eval when appropriate).
sigchld_handler: Handler for SIGCHILD signals.
sigint_handler: Handler for SIGINT signals.
sigtstp_handler: Handler for SIGTSTP signals.

You should not modify any included functions other than those listed above.

Compile your shell using the included Makefile by running make. Then, to run your shell, simply execute it:

$ ./bsh
bsh> [type commands to your shell here]

Note that the bsh> command prompt indicates that you are within bsh rather than the standard system shell bash. You can exit out of bsh by typing Ctrl-D, which indicates that there is no more input, and will cause the shell to exit. You can alternately use the built-in quit command once you get that working (which will be one of your first tasks).

Included Files

You have also been provided with a number of test programs and tools to help you test your shell. All included files are described below:

bsh.c: The code of your shell.
bshref: An executable of a complete shell that follows the bsh specification. Run this program if you have any questions about how your shell should behave. Your shell should emit identical output to the reference shell (with a few caveats, noted later).
sdriver.pl: A driver program that executes your shell and feeds it commands and signals as directed by a trace file, then captures and displays the output from the shell. The format of the trace files is detailed below.
trace{01-14}.txt: 14 trace files that you will use in conjunction with the shell driver to test the correctness of your shell. The lower-numbered trace files are simpler tests, while the higher-numbered traces are more complicated tests.
bshref.out: The output of the reference shell on all traces, for your reference.
myspin.c: A test program that sleeps for a specified number of seconds.
mysplit.c: A test program that forks a child, which then sleeps for a specified number of seconds.
mystop.c: A test program that sleeps for a specified number of seconds, then sends a SIGTSTP signal to itself (not to the shell!).
myint.c: A test program that sleeps for a specified number of seconds, then sends a SIGINT signal to itself (not to the shell!).
Makefile: Builds the shell and all test programs. Also provides useful targets for testing the shell (see below).

Use the -h flag to see the usage string for sdriver.pl:

$ ./sdriver.pl -h
Usage: ./sdriver.pl [-hv] -t <trace> -s <shellprog> -a <args>
Options:
  -h            Print this message
  -v            Be more verbose
  -t <trace>    Trace file
  -s <shell>    Shell program to test
  -a <args>     Shell arguments

For example, you could run the shell driver on trace01.txt by typing the following:

$ ./sdriver.pl -t trace01.txt -s ./bsh

Similarly, you could run the trace driver on the reference shell by simply substituting bsh with bshref in the command above.

More simply, you can use the included Makefile to run the driver on the trace files. To run trace01.txt using your own shell, you can just run:

$ make test01

To run trace01.txt through the reference shell, you can similarly run:

$ make rtest01

The other traces can be run in the same way (e.g., make test02 and make rtest02). The output of your shell from the trace files is exactly the same as the output you would get from running your shell interactively, except for an initial comment that identifies each trace and a few values that might vary from run to run (e.g., pids).

Output Formatting

Your shell's output formatting should exactly match that of the reference shell. For example, the command-line prompt of your shell should be the precise string "bsh> ", and your output messages should contain the same information in the same format as the reference shell. Particular messages that you should look out for (and their correct formatting) include the following:

Running an invalid program ./prog:
```
./prog: Command not found
```
Running a new background job with jid 20 and pid 500 executing /bin/ls -l:
```
[20] (500) /bin/ls -l
```
Switching an existing job with jid 20 and pid 500 executing /bin/ls -l to the background:
```
[20] (500) /bin/ls -l
```

Running bg without specifying a job:

bg command requires PID or %jobid argument

Running fg without specifying a job:

fg command requires PID or %jobid argument

Specifying an invalid pid 500 to bg or fg:
```
(500): No such process
```
Specifying an invalid jid 20 to bg or fg:
```
%20: No such job
```
Specifying something other than a pid or jid for bg:
```
bg: argument must be a PID or %jobid
```
Specifying something other than a pid or jid for fg:
```
fg: argument must be a PID or %jobid
```
Job with jid 20 and pid 500 terminated by signal 15 (use WTERMSIG to find the signal number from the child status):
```
Job [20] (500) terminated by signal 15
```
Job with jid 20 and pid 500 stopped by signal 20 (use WSTOPSIG to find the signal number from the child status):
```
Job [20] (500) stopped by signal 20
```

Note that the above messages should always be printed. You are welcome to add additional output when running in verbose mode, but your verbose output does not need to match that of the reference shell. Refer to the reference shell if you are unsure about any of the exact formatting of these messages. In particular, trace 12 exercises many of the error messages.

Shell Trace Files

Each trace file consist of a series of commands to test the functionality of your shell. The trace files are understood by the sdriver.pl driver program, which launches your shell, executes each line of the trace file via your running shell process, and captures the shell's output. The format of the trace files is described below:

Each non-empty line of a trace file consists of either (1) a comment, (2) a driver command, or (3) a command string.
Comment lines start with # and are ignored by the driver.
Driver commands are specified in ALL_CAPS and trigger some external event that interacts with the shell. An intuitive way to think about driver commands is that they are simulating a human user interacting with the shell in ways other than typing regular commands. The driver commands used in the trace files are summarized below:
- INT: The simulated user types Ctrl-C, which sends a SIGINT signal to the shell.
- TSTP: The simulated user types Ctrl-Z, which sends a SIGTSTP signal to the shell.
- SLEEP n: The simulated user does nothing for n seconds.
All other non-empty lines are command strings, which specify programs to be launched via the shell process. These command strings are passed verbatim to the bsh process (i.e., the simulated user types the command string at the prompt and then presses Enter).
Once the driver has issued every command specified in the trace, the shell is notified that there is no more input to read, which normally causes the shell process to terminate. In other words, there is basically an implicit CLOSE command at the end of every trace.

The external programs launched by the trace files include the provided programs myspin, mysplit, mystop, and myint, as well as the external system programs echo (i.e., /bin/echo) and ps (i.e., /bin/ps). The ps program prints process information and is discussed later. The echo program simply prints out a message specified on the command line (i.e., it "echoes" a message back to you). Starting in trace03.txt, the echo program is used simply as a way to print out the non-echo commands that the shell is about to execute prior to actually doing so. For example, consider trace03.txt, which is reproduced below:

#
# trace03.txt - Process jobs builtin command.
#
echo -e bsh> ./myspin 2 \046
./myspin 2 &

echo -e bsh> ./myspin 3 \046
./myspin 3 &

echo bsh> jobs
jobs

In this trace file, the echo program will just print out the two myspin commands and the jobs command before they actually execute. Thus, these echo commands serve to give a visual indication of what the traces are doing as they run. These echo commands launch foreground jobs like any other, but assuming you have passed trace02.txt already, foreground jobs should already be working.

Note that some of these echos include the character sequence \046. This sequence is just a way to specify the ampersand character & as a character that echo should print (the value \046 is the ASCII code of an ampersand in base 8). Using this character sequence is necessary because if an actual ampersand were used, then the echo job would be run in the background. Since these echoes should execute in the foreground (before the following command actually runs), this character sequence is needed to actually output an ampersand.

Listing Process Info with `ps`

Traces 9, 10, and 11 use the ps program, which lists active process information (from the OS, not from your shell). The ps program will show a variety of process info (one process per line) but the column of particular interest is the STAT (state) column, which is shown when executing ps w (as in the trace files). Note that ps shows different output in a variety of formats depending on the options passed, so to avoid confusion, stick to executing ps w only. A number of different state codes may be displayed in the state column, but the only ones you really need to pay attention to are the following three:

R: the process is currently running
T: the process is currently stopped (suspended)
Z: the process is an unreaped zombie

Make sure that these process state codes match as shown in your shell and the reference shell. Note that the output of ps will show all of your processes, including processes unrelated to your shell. The only processes you should concern yourself with in the output of ps are the processes created by your shell process (in particular, the mysplit processes).

Implementation Advice

Here are some useful tips for working on your shell:

Read the starter code: Before you start coding, carefully read through the starter code and helper functions in bsh.c. While you do not need to go through every line of every provided function, you should be clear on how to use every helper function that you are given.
Tackle the traces in order: The trace files are progressive and serve as a useful guide for how to develop your shell. I.e., start with trace01.txt and get that working before moving onto trace02.txt. As you consider each new trace, make sure you are absolutely clear on how the trace is supposed to behave (refer to the trace file section above and ask if you still aren't sure). Getting a trace working does not simply mean that your program doesn't crash or get stuck. Most likely bugs in this lab are not of the obvious variety, so noticing them will require that you are clear on what is supposed to be happening in each trace file.
Test interactively: Especially at first, don't exclusively rely on the trace files when testing your shell. Remember that shells are intended to be interactive programs! In many cases, errors will be more apparent when running the shell interactively rather than running it in an automated fashion via a trace file. You can always "manually" run a trace just by performing the specified commands yourself when directly executing the shell. You may need to increase the time durations specified to the helper programs (e.g., ./mysplit 10 instead of ./mysplit 1) to give yourself sufficient time to run the trace commands manually.
Use non-graphical test programs: Full-window programs like more, less, nano, vim, and emacs do strange things with the terminal settings. Don't run these programs from your shell; instead, stick with simple text-based programs like ls, ps, pwd, and echo (as well as the various test programs provided to you in the lab repository).
Read the manpages: There are lots of different system calls that you will be using in your shell. Specific system calls that you might want to use include fork, execve, getpid, waitpid, kill, setpgid, sigprocmask, and sigsuspend. Refer to the Linux manual pages when working with system calls. You can get a detailed reference on any system call right from the terminal using man, e.g., by running man fork. You may also wish to refer to the waitpid options and status macros detailed in the class process slides (though these are also detailed in the waitpid manpage).
Check syscall return values: You should check the return value of every system call to be sure that you catch errors. Most system calls will silently return a negative value on failure, so if you are not checking your return values, you may not even realize that you are trying to use a system call improperly (which is the most likely reason that a system call may fail here). In most cases, the appropriate thing to do when a system call fails is to call the provided error function, which allows you to specify your own error message, then prints out a system-generated message as well and exits the shell.

Tips for specific parts of the shell are given below.

Eval

You should use the provided parseline function to perform the heavy lifting of parsing the words of the command line. Note that parseline takes the command line as well as an array argv which will be filled in with pointers to the individual words of the command line. The pointer array isn't expected to be pre-initialized; it's just a bunch of storage which parseline will fill in with pointers to the actual words (located within the command line). Note that since the pointers in argv will point to existing memory within the command line, you don't need to malloc any additional memory here. Instead, just declare a local array to hold the argument pointers, such as the following:
```
char* argv[MAXARGS]; // a local array that holds MAXARGS pointers
```
You can then pass this (uninitialized) array to parseline, which will fill it in appropriately. In fact, you shouldn't need to call malloc anywhere in the entire lab, as all necessary persistent data structures are pre-allocated as global variables.
Make sure that your program doesn't crash if the user enters an empty line. Your code can check for this after calling parseline very easily, but if you do not specifically test for this case, your complete shell is likely to segfault on empty lines (which is not very user friendly!).
Following parsing, you should call builtin_cmd (which is already provided to you) to execute a built-in command if one was specified. If not, pass the constructed argv array to execve in order to launch the new job running the target program. For the third parameter to execve, pass the predefined global variable environ (which is the set of "environment variables" defined in the current shell session).
One aspect of argv that you may need to modify prior to calling execve is the program name in argv[0] to account for the shell PATH. Recall that the PATH is the set of directories that the shell will use to look for program names. For simplicity, the PATH here is the single directory /bin. Thus, you need to prepend the program name with /bin/ (e.g., to translate ls to /bin/ls). However, you should not prepend the program name if it starts with either / or . (as in the program names /full/path/to/myprog or ./myprog). Since you are working in C, you will need to create a separate buffer to hold the (larger, prepended) program name. Declare the buffer, copy the "/bin/" string into it using strcpy, then append the original program name (e.g., "ls") using strcat. You can then replace the original program name in argv[0] with your full program name. Run valgrind to make sure you aren't doing anything unsafe with memory during this string processing.
When adding a new job to the job list, you must be careful to ensure that the job list is not corrupted. In particular, a concurrency bug can occur if the shell forks a child, but before the parent actually adds the child to the job list by calling addjob, the child exits and is reaped by sigchld_handler. Then, the job will be subsequently added and then never removed. This type of bug is called a race condition, as it depends on two processes "racing" during concurrent execution, and is difficult to debug as it occurs nondeterministically! To protect the job list, you'll want to prevent your signal handlers from running until the new child job is actually added to the job list.
Also related to the above issue, remember that children inherit a copy of the signal mask of their parents, and therefore the child must be sure to unblock its signals before it executes the new program using execve.
When you launch bsh from the regular Unix shell (i.e., the bash process), your bsh shell is running in the foreground process group of Bash. If your shell then creates a child process, that child will also (by default) be a member of Bash's foreground process group. Since typing Ctrl-C sends a SIGINT to every process in Bash's foreground group, doing so will send a SIGINT to your shell (which is good), but also to every process that your shell created (which is bad). You only want to send the SIGINT to your shell, which will then pass it along to your own foreground job (if one exists), but not to any background jobs. To handle this issue, after calling fork but before the new child calls execve, the child should call setpgid(0, 0), which puts the child in a new process group whose group ID is equal to the child's PID. Since the child's PID is guaranteed to be unique, setting the child's group ID equal to its PID effectively removes it from Bash's foreground process group, ensuring that your shell will be the sole process in Bash's foreground group.

Signal Handlers

When you implement your signal handlers, be sure to send SIGINT and SIGTSTP signals to the entire foreground process group, using "-pid" instead of "pid" in the argument to the kill function (a negative PID other than -1 sends the signal to an entire process group).
Due to the effective concurrency of signal handlers with the rest of the program code, be careful about what code you put in signal handler functions. For example, you should not use printf in signal handlers (since weird things may happen if the signal handler is called when the program is already in the middle of executing printf). You can use the provided safe_printf function as a drop-in "safe" printf within handlers.
One of the tricky parts of the shell design is deciding on the allocation of work between waitfg and sigchld_handler, particularly with respect to performing reaping of child processes. A recommended approach to minimize overall complexity is to perform all reaping (via waitpid) within sigchld_handler. With this approach, waitfg will still pause until the specified pid is longer in the foreground, but it will not do so using waitpid. A tempting alternative is for waitfg to use sleep inside a loop to periodically check that the process is still in the foreground, but this pattern is called busy-waiting and should be avoided (as it wastes CPU time and will likely wait longer than needed). Instead, waitfg can use the sigsuspend function as a mechanism to block until a signal is received and processed (at which point you can check if the process is still in the foreground).
In general, system calls always return -1 and set the global errno variable if an error occurs (which you can easily access using the provided error function). However, there are a few special cases to be aware of. One is that if waitpid has no remaining children to wait on, then it will return -1 and set errno to the value ECHILD. Importantly, this is not an actual error, despite waitpid returning -1. Assuming that you are checking your return values for errors (which you should!), you will need to specifically check for this condition. Another special case is sigsuspend, which is defined to always returns -1.

Logistics

As usual, initialize your lab repository on GitHub via the invitation link posted to the Slack, then clone to the class server to begin working. You are responsible for completing bsh.c, but should not create or modify any other file.

If you are working in a group and have not previously done so, it is a good idea to go through Part 3 of the Git tutorial, which covers some specific topics applicable to collaboration (most significant of which is handling merge conflicts). You should also review the course policies on group work.

Your final submission will consist of your committed and pushed bsh.c file at the time of the due date. Remember to submit your individual group reports to me if you worked in a group.

Evaluation

Your shell will be graded on program correctness (as determined by the 14 trace files), design, and style. The output of your shell on the trace files should be identical to that of the reference shell, with two exceptions:

The PIDs on any given run can (and will) be different.
The output of the ps program in traces 9, 10, and 11 will be different from run to run (since the ps program displays PIDs, among other things). However, as discussed previously, the running states of any mysplit processes in the ps output should be identical.

You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions about what constitutes good program design and/or style that are not covered by the guide.