Lab 4 - Stack Attack

Assigned:	Friday, April 9
Checkpoint:	Friday, April 16, 11:59 pm. Exploits 1 and 2 should be completed by the checkpoint.
Due Date:	Tuesday, April 20, 11:59 pm. All exploits are due.
Collaboration Policy:	Level 1
Group Policy:	Individual

This lab will teach you about buffer overflows, a common class of bugs that can lead to real-world security vulnerabilities. You will explore some of the techniques used to exploit these bugs as well as some of the features provided by compilers and operating systems to make programs more secure against these attacks.

Note: We do not condone the use of any form of attack to gain unauthorized access to any system (ours or otherwise)!

Start by reading through the entire lab description.

Lab Overview

Most of the bitbombs on our server have been safely defused, but a few unusual ones remain. We believe that these programs are actually prototypes of more powerful, implosive bytebombs (8x the explosive power)! The bytebombs appear to be detonated via activation strings that exploit security vulnerabilities hidden in the code. Luckily, we've managed to obtain partial source code to the bytebombs that should prove useful in figuring out how they work.

We've placed the bytebombs in a secure, binary-shielded part of our server. We need you to find working activation strings so that we can safely dispose of the bytebombs!

ByteBomb Files

Once you accept the lab repository on GitHub, I will place your personal bytebomb (and associated files) in your repository. Once this is done, do a git pull to fetch them. Once you have done this, there will be several key files in your lab directory:

bytebomb: A bytebomb executable program. We believe this bomb can be activated through the use of code injection attacks!
bigbytebomb: A second-generation, more powerful bytebomb executable. We believe this bomb has been hardened against code injection attacks and cannot be triggered so easily. However, we think the bigbytebomb may still be activated using return-oriented programming attacks!
id.txt: Contains two values: (1) your numeric bytebomb ID, and (2) an 8-digit hex code left alongside your bomb. We think this code is important in activating the bomb.
farm.c: Partial source code of the bombs which you can use as a "gadget farm" when generating return-oriented programming attacks.
hex2raw: A utility to convert textual hex data into raw binary data; useful in generating exploit strings.
1-hiss.txt through 5-implodebig.txt: Plain text files in which you should write and comment your (human-readable) exploit strings.
notebook.txt: A plain text file in which you should document your process of figuring out how each stage of the bytebombs are activated.

ByteBomb Input

Each bytebomb reads an activation string when run using the following getbuf function:

unsigned getbuf() {
    char buf[BUFFER_SIZE];
    Gets(buf);
    return 1;
}

The function Gets (called from within getbuf) is roughly equivalent to the standard library function gets. In particular, Gets reads an input string (terminated by '\n' or end-of-file) and stores the inputted bytes (along with a null terminator) at the specified destination buffer. The constant BUFFER_SIZE determines the size of the buffer and is specific to your bytebomb.

The getbuf function is called from the test function, which is as follows:

void test() {
  int val;
  val = getbuf(); 
  printf("Nothing happens. (0x%x)\n", val);
}

You hopefully notice that this function presents an entry point through which you can smash the stack! Experiment with your bytebomb to confirm that this is the case (you can defer examining your bigbytebomb until later, though that works similarly). Similar to the demo we ran in class, it should be very easy to make the program segfault. However, while crashing the bomb is quite easily done, your goal is to write more clever input strings that will cause the bomb to activate in a variety of ways. These input strings are called exploit strings.

Note that unlike in the previous lab, there is no record kept of unsuccessful input attempts. As such, feel free to experiment with any input strings as many times as you wish.

Also note that each bomb prints its unique ID when run. As with the BUFFER_SIZE constant, the value of this ID is unique to your bomb.

Formatting Exploit Strings

Exploit strings will typically contain byte values that do not correspond to ASCII values used for printed characters. As a result, such exploit strings cannot be typed directly. Instead, you'll need to generate raw (i.e., binary) strings consisting of arbitrary byte values. Although we did so in class by writing a dedicated program to output a raw string, the included program hex2raw will allow you to more easily generate raw strings by writing a textual file specifying the numeric values of each byte in your exploit string.

The hex2raw program takes a hex-formatted string as input. In a hex-formatted string, each byte value is represented as two hex digits and separated by spaces. For example, suppose you wanted to specify an exploit string consisting of four bytes with the byte values 0, 10, 255, and 5. You would specify this exploit string in hex format as follows:

00 0A FF 05

If you wanted to include textual characters in the exploit string, you would need to specify their ASCII values. For example, if you wanted to include 'G' in your exploit string, the corresponding byte should be specified as 47 (since 0x47 is the ASCII code for G). See man ascii for a full table of ASCII values. Any non-hex digits in the exploit string (i.e., anything outside the range 0-F) are ignored by hex2raw when outputting the raw string.

You should use the included text files 1-hiss.txt through 5-implodebig.txt to write your textual, hex-formatted exploit strings for each phase. To actually pass each exploit string to your bytebomb, you will need to use hex2raw to generate the raw string from the hex-formatted string in the text file. There are two ways to do this:

First, you can use output redirection to save the raw string to a file, and then redirect that new file back to the bytebomb, as follows (assuming the hex-formatted string is in 1-hiss.txt):
```
$ ./hex2raw -i 1-hiss.txt > 1-hiss.bytes
$ ./bytebomb < 1-hiss.bytes
```
The above will create a new file called 1-hiss.bytes containing the raw (not human-readable) version of the exploit string in 1-hiss.txt, and then give that as input to the bytebomb. Note that you must use the -i flag as indicated above when calling hex2raw on a file input. Once you have the raw exploit file, you can also run the exploit from within gdb, as follows:
```
$ gdb ./bytebomb
(gdb) run < 1-hiss.bytes
```
Alternately, you can use a Unix pipe, which is a standard way of passing output from one program directly as input to another program, like so (again assuming a hex-formatted string in 1-hiss.txt):
```
$ cat 1-hiss.txt | ./hex2raw | ./bytebomb
```
The above command uses the cat program to output the contents of 1-hiss.txt, which are passed (or 'piped') as input to the program hex2raw. The hex2raw program, in turn, outputs the raw exploit string and pipes it to bytebomb. Using pipes has the advantage of not requiring the extra step of creating the raw file, but isn't as straightforward to use if you want to run the exploit within GDB.

One important note is that your exploit string must not contain byte value 0x0A at any intermediate position, since this is the ASCII value for a newline '\n'. When Gets() encounters this byte, it will assume that you intended to terminate the string input. However, hex2raw will warn you if it encounters this byte value.

Exploits

There are five exploits to tackle to fully unlock the capabilities of your bytebombs. The first three exploits involve attacking the bytebomb program, while the last two exploits involve attacking the bigbytebomb program. The goal of each exploit is to 'repurpose' the bomb to execute something that it should not normally execute.

However, the methods you will use to accomplish each exploit will vary. The first exploit is a basic buffer overflow attack, the next two exploits require you to use code injection attacks, while the last two exploits require you to use return-oriented programming. Each of the five exploits are described in more detail below.

For all of these exploits, one of the most important things you should is to draw a picture of the stack and understand where things are located during the attack. If you don't have a very clear and specific understanding of how the stack is laid out, you will have a hard time completing the exploits.

Part A: Stack Overflow

The first exploit will have you exploit the bytebomb executable using a basic buffer overflow attack. Recall that the idea of a buffer overflow attack is to smash the stack by overflowing a buffer, then overwrite the function's return address with some other desired address. Subsequently, when the function executes the ret instruction, rather than returning to the calling function, the program will jump to the address you specified.

Exploit 1: Hiss

This exploit simply requires you to get the bytebomb to execute the hiss function, which is defined as follows:

void hiss() {
  vlevel = 1;       /* Part of validation protocol */
  printf("The bytebomb hisses loudly!\n");
  validate(1);
  exit(0);
}

To do so, you will need to smash the stack and change the return address of getbuf. All the information you need to devise your exploit string for this stage can be determined by examining a disassembled version of bytebomb using objdump -d bytebomb. However, as we demonstrated in class, you may want to use gdb to step through the last few instructions of getbuf (prior to executing ret) to inspect the modified return address and make sure it is what you expect. Remember to be careful about byte ordering (i.e., endianness) as well as the placement of buf within the stack frame.

Two registers that are of particular importance in this and all future exploits are the stack pointer %rsp and the program counter (aka instruction pointer) %rip. Make sure you're clear on what these two registers mean and how they're used. Both will get manipulated (either directly or indirectly) through your attacks, and you may wish to track them in GDB while debugging. Using the x command in GDB to view memory contents on the stack will be invaluable as well.

If you are having trouble starting, review the material and slides from class on buffer overflows. Remember that you don't need to inject your own assembly code for this exploit (though you still need to smash the stack). Also remember that although we wrote a program in class for the express purpose of generating an exploit string, here you don't need to do that, because the provided hex2raw program provides an easier way to generate exploit strings. See the hex2raw usage instructions above.

warning Important! The hex2raw program permits both line breaks and comments inside the exploit string that are delimited with /* and */, including the spaces (e.g., /* this is a comment */). Do not write your entire exploit string in a single unbroken line! Instead, you should split your exploit strings across multiple lines into logical sections and comment each section appropriately. Note that you must use /* */ style comments; hex2raw will not recognize comments starting with //, or comments in which there's no space next to the * (e.g., /*an invalid comment*/).

Part B: Code Injection Attacks

The next two exploits will have you exploiting the bytebomb executable using code injection attacks. Recall that the idea in a code injection attack is to use a vulnerable function to inject a series of assembly instructions onto the stack, then overwrite the function's return address with the address of the injected instructions. Then, when the function executes the ret instruction, the program will jump to the injected instructions and begin executing them.

The two code injection exploits are described below, followed by a description of how to generate the byte codes comprising the assembly instructions that you intend to inject.

Exploit 2: Glow

The objective of this exploit is to make the bytebomb glow via the glow function, defined as follows:

void glow(unsigned val) {
  vlevel = 2;       /* Part of validation protocol */
  if (val == bombcode) {
    printf("The bytebomb glows brightly!\n");
    validate(2);
  } else {
    printf("The bytebomb flickers faintly. (0x%.8x)\n", val);
    fail(2);
  }
  exit(0);
}

You will note that this is not quite as simple as hiss, since you need to not only execute the function but also make it appear as if you have passed a particular argument value. That id.txt file might be relevant here...

Some specific advice for this stage:

In order to include your own injected code in the exploit string, you will need to use the actual byte encodings of assembly instructions. See the Generating Byte Codes section below for more details.
Don't try to use either jmp or call instructions - these instructions use PC-relative addressing, which is quite difficult to set up correctly in injected code. Instead, use ret for all transfers of control, even when you are not returning from a call. Remember the specific mechanics of the ret instruction: it simply pops off the 8-byte address on the top of the stack and sets %rip to that value. It is critical that you are clear on these mechanics to understand the use of ret in your exploit string!
When writing assembly code, be careful of the difference between, e.g., $0x3 (the value 3) and 0x3 (the value at memory address 3). It's easy to write the latter when you actually mean the former.
If the argument val does not have the correct value, the output of the bytebomb will print out the (incorrect) value.

Exploit 3: Implode

Exploit 3 is another code injection attack to the implode function, but here the argument is a string and there is a helper function involved:

/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char* sval) {
  char cbuf[110];
  /* Make position of check string unpredictable */
  char* s = cbuf + random() % 100;
  sprintf(s, "%.8x", val);
  return strncmp(sval, s, 9) == 0;
}

void implode(char* sval) {
  vlevel = 3;       /* Part of validation protocol */
  if (hexmatch(bombcode, sval)) {
    printf("BOOM!!! The bytebomb implodes!\n");
    validate(3);
  } else {
    printf("The bytebomb smokes slightly. (\"%s\")\n", sval);
    fail(3);
  }
  exit(0);
}

While the idea is the same as in exploit 2, this exploit has a few notable differences. Since the argument is now a string, you will need to include the string representation of your bomb ID somewhere in your exploit string. This string should consist of the eight hex digits (ordered from most to least significant) without a leading "0x". Remember that a string in C is terminated by a null character (i.e., byte value 0).

Another complication is that hexmatch and strncmp will push data onto the stack when they are called, potentially overwriting portions of memory that held the buffer used by getbuf. As a result, you will need to be careful where you place the string representation of your bomb ID. Drawing a picture of the stack (keeping in mind where the stack frames of hexmatch and strncmp will go) will be useful in thinking about how to handle this problem.

Generating Byte Codes

For code injection attacks, your exploit strings will need to include the actual encodings of assembly instructions (i.e., byte codes). You can find these actual encodings by hand-writing assembly instructions, using gcc as an assembler, and objdump as a disassembler. For example, suppose you write a file example.s containing the following hand-written assembly code (.s is the standard suffix for an assembly code file):

# Example of hand-generated assembly code
pushq   $0xabcdef     # Push value onto stack
addq    $17,%rax      # Add 17 to %rax
movl    %eax,%edx     # Copy lower 32 bits to %edx

You can now assemble and then disassemble this file:

$ gcc -c example.s
$ objdump -d example.o > example.d

The generated file example.d now contains the byte encodings of the instructions in example.s; in particular, the following lines of interest:

0: 68 ef cd ab 00     pushq $0xabcdef
5: 48 83 c0 11        add $0x11,%rax
9: 89 c2              %eax,%edx

Each line shows the byte values that encode a single assembly language instruction. The number of the left indicates the starting address (starting with 0) while the hex digits after the colon indicate the byte codes for the instruction. For example, we can see that the instruction push $0xABCDEF has hex-formatted byte code 68 ef cd ab 00.

From the above, we can read off the entire byte sequence for the code:

68 ef cd ab 00 48 83 c0 11 89 c2

This hex-formatted exploit string can then be passed through hex2raw to generate a raw input string for the bytebomb. Alternately (and perhaps preferably) you can simply edit the example.d file to omit extraneous characters and to contain C-style comments for readability, yielding:

   68 ef cd ab 00   /* pushq  $0xabcdef  */
   48 83 c0 11      /* add    $0x11,%rax */
   89 c2            /* mov    %eax,%edx  */

However, remember that you should store your human-readable exploit strings in the existing .txt files located in your directory for that purpose.

Writing Data onto the Stack

Many of these exploits will require you to write data values (not just code bytes) into the stack. For example, you might wish to write a particular address value onto the stack to be later used by your exploit.

Your first thought may be to do so by including an instruction like pushq $0x123456789 or movq $0x123456789, (%rsp) as part of your injected code. However, doing so may cause you to run into a particular quirk of x86-64, which is that most instructions that take immediate data operands can only accept up to 32-bit immediate data values (which are then sign-extended to 64 bits if used in a 64-bit context like pushq or movq). As a result, the assembler will reject instructions such as the aformentioned examples (since the immediate data operands require more than 32 bits) and you will get an error message such as "operand type mismatch".

The movq instruction is actually an exception to this basic rule in that it can accept a full 64-bit immediate data operand. However, it cannot take take such a value and move it directly into memory, which is why the previous movq instruction is invalid. As a workaround, you can accomplish the same task in two instructions by first using movq to put the 64-bit value into a register and then using pushq on the register to write the value onto the stack. However, remember that all the bytes in your exploit string are already getting written onto the stack by the buffer overflow itself, even without having executed any injected code. As a result, you can more easily put data onto the stack just by specifying the desired data bytes as part of your exploit string (i.e., separate from the injected code bytes) and then letting the Gets function write them onto the stack as usual prior to executing your injected code.

Part C: Return-Oriented Programming

Exploits 4 and 5 require you to attack the bigbytebomb executable. While the source code of the bigbytebomb is nearly identical to that of the bytebomb, code injection attacks against the bigbytebomb are difficult due to the usage of two techniques to thwart such attacks:

Stack randomization is used so that the address of the stack is randomly determined each time you run the program. This makes it difficult to determine where your injected code will be located.
The stack memory segment is marked nonexecutable, so even if you were able to locate the address of your injected code, the program would fail with a segmentation fault when you tried to execute it.

However, despite the above protections, the bigbytebomb is still vulnerable to return-oriented programming (ROP) attacks. Recall that the key idea in ROP is to identify byte sequences within the existing program consisting of one or more instructions followed by the ret instruction. Such sections of code are called gadgets. By smashing the stack using gadget addresses and possibly other data, you can construct a chain of gadgets that implements an exploit.

Locating useful gadgets is a difficult problem in itself. Luckily, we've found source code to a series of functions within the bigbytebomb that we think contains useful gadgets. This set of functions is called the gadget farm and is contained within the farm.c source file. For exploits 4 and 5, you will need to identify useful gadgets from within the gadget farm to perform attacks similar to those in exploits 2 and 3.

To help you in identifying gadgets, we have compiled a handout detailing the encodings of useful instructions. More details are provided below.

Exploit 4: GlowBig

Exploit 4 requires you to repeat the Glow exploit against the bigbytebomb. If you try your working Exploit 2 against the bigbytebomb, it will fail due to the security measures mentioned above. Thus, you will instead need to construct a ROP exploit. You can construct this ROP exploit using gadgets only touching the first eight x86-64 registers (%rax through %rdi) and including only the following instruction types:

movq : encodings are given in the instruction encoding handout.
popq : encodings are given in the instruction encoding handout.
ret : encoded by the single byte 0xC3.
nop : encoded by the single byte 0x90. This instruction ("no op", short for "no operation") is an instruction whose only effect is to increment the program counter %rip by 1. In effect, this instruction can be used to provide 'padding' in an instruction byte sequence.

Some specific advice for this exploit:

You should only use gadgets drawn from within the gadget farm (not from elsewhere in the program). You can perform this exploit using only gadgets between the start_farm and mid_farm instruction addresses.
This attack is possible using only two gadgets.
Keep in mind that the popq instruction pops data from the stack. This instruction gives you an easy way to inject data (but not code) into the program. In doing so, note that your exploit string will contain a combination of gadget addresses and data.
Warning! Don't compile farm.c. Remember that the functions in farm.c are part of the bigbytebomb executable, and that is where you should find their addresses. If you compile farm.c as a standalone program and inspect that, the gadget addresses will not match those in the bigbytebomb.

Exploit 5: ImplodeBig

Before you tackle Exploit 5, consider what you have accomplished so far. In Exploit 1, you redirected a program to execute part of the program that it wasn't supposed to execute. In Exploits 2 and 3, you caused a program to execute machine code of your own design; if bytebomb had been a network server, you could have injected your own code into a remote machine. In Exploit 4, you circumvented two of the primary ways modern systems use to thwart buffer overflow attacks. Although you did not inject your own code, you were able to hijack the operation of the program using pieces of existing code.

Exploit 5 requires you to perform the Implode attack (Exploit 3) on the bigbytebomb. This is substantially more difficult than the Glow attack of the previous stage, and as such, it is only worth 5% of your lab grade. Think of this last exploit more like a challenge problem rather than a necessary component of the lab to earn a good score.

Some specific advice for this exploit:

You should use gadgets drawn from within the complete gadget farm (between start_farm and end_farm).
Remember the effect that movl has on the upper 4 bytes of the destination register (automatically zeroes them out).
In addition to the standard nop instruction, you can also use other instructions as functional nops, which are 2-byte instructions that do not change any registers or memory values. Useful functional nop instructions are given in the encoding handout.
The simplest type of gadget is just calling an existing function (which potentially allows for more instructions than just those specified in the encoding handout).
My reference solution requires eight (not neccessarily unique) gadgets.

Logistics

You are responsible for two tasks:

Determine working exploit strings and store them in the five provided textfiles (1-hiss.txt through 5-implodebig.txt). Remember that these files should contain the human-readable exploit strings, not the raw strings outputted by hex2raw. Each exploit textfile should be written on multiple lines (i.e., not as a single unbroken line!) with inline comments summarizing the purpose of each section.
Document your methods and insights in notebook.txt, again organized by exploit (but everything can be stored in this one single file). The documentation in notebook.txt should supplement the inline documentation in each of the exploit strings (but you needn't repeat yourself, so you may not need to write too much here if you wrote clear inline documentation).

Your final submission will consist of your committed files at the time of the due date.

Evaluation

You will be evaluated both on determining working exploits for each stage as well as clearly documenting your methods and insights in notebook.txt. Total points for each exploit are listed below:

Hiss: 20 points
Glow: 25 points
Implode: 20 points
GlowBig: 30 points
ImplodeBig: 5 points

Partial credit is possible for clear documentation that demonstrates some understanding of the exploit even if the full exploit is not working.

Disarmament Status Report

The Disarmament Status Report page has been updated to show progress towards figuring out the bytebombs. You can view your progress towards completing the exploits on this page.

Exploit notifications are automatically logged by the server and require no action on your part.