Lab 4 - Byte Stacker

Assigned:	Wednesday, April 4.
Checkpoint:	Friday, April 6. Exploit 1 should be completed by the checkpoint.
Due Date:	Thursday, April 12. All exploits are due.
Collaboration Policy:	Level 1 (refer to the official policy for details)
Group Policy:	Individual

This lab will teach you about buffer overflows, a common class of bugs that can lead to real-world security vulnerabilities. You will explore some of the techniques used to exploit these bugs as well as some of the features provided by compilers and operating systems to make programs more secure against these attacks.

Note: We do not condone the use of any form of attack to gain unauthorized access to any system (ours or otherwise). Please act responsibly!

Start by reading through the entire lab description.

Lab Overview

Most of the bitbombs on our server have been safely defused, but a few unusual ones remain. While it appears at first glance that these bitbombs are duds, we believe that they are actually prototypes of much more powerful, implosive bytebombs (8x the explosive power)! The bytebombs appear to be detonated via activation strings that exploit security vulernabilities hidden in the code. Luckily, we've managed to obtain partial source code to the bytebombs that should prove useful in figuring out how they work.

We've placed the bytebombs in a secure, binary-shielded part of our server -- we need you to find working activation strings so that we can safely dispose of these bytebombs. Get ready to stack some bytes and smash some stacks!

ByteBomb Files

A set of files pertaining to your bytebomb have been checked into your SVN directory under lab4. First, run svn update within your directory to download these files. Within the lab4 directory, there are several key files:

bytebomb: A bytebomb executable program. We believe this bomb can be activated through the use of code injection attacks!
bigbytebomb: A second-generation, more powerful bytebomb executable. We believe this bomb has been hardened against code-injection attacks and cannot be triggered so easily. However, we think the bigbytebomb may still be activated using return-oriented programming attacks!
id.txt: Your bytebomb ID and an 8-digit hex code left alongside your bomb. We think this code is important in activating the bomb.
farm.c: Partial source code of the bombs which you can use as a "gadget farm" when generating return-oriented programming attacks.
hextoraw: A utility to convert textual hex data into raw binary data; useful in generating exploit strings.
1-hiss.txt through 5-implodebig.txt: Plain text files in which you should write and comment your (human-readable) exploit strings.
notebook.txt: A plain text file in which you should document your process of figuring out how each stage of the bytebombs are activated.

ByteBomb Input

Each bytebomb reads an activation string when run using the following getbuf function:

unsigned getbuf() {
    char buf[BUFFER_SIZE];
    Gets(buf);
    return 1;
}

The getbuf function is called from the test function, which is as follows:

void test() {
  int val;
  val = getbuf(); 
  printf("Nothing happens. (0x%x)\n", val);
}

The function Gets is similar to the standard library function gets -- it reads a string from standard input (terminated by '\n' or end-of-file) and stores it (along with a null terminator) at the specified destination. In this code, you can see that the destination is the array buf, declared as having BUFFER_SIZE bytes. At the time your bombs were generated, BUFFER_SIZE was a compile-time constant specific to your bytebomb.

You hopefully notice that this function presents an entry point through which you can smash the stack! Experiment with your bytebomb to confirm that this is the case (you can defer examining bigbytebomb until later, though that works similarly). You should be able to very easily make the program segfault!

Note that unlike in the previous lab, there is no potential penalty associated with detonating (or not detonating, as the case may be) the bytebombs. Feel free to experiment with any input strings you wish!

Also note that each bomb prints its unique ID when run. As with the BUFFER_SIZE constant, the value of this ID is unique to your bomb.

While crashing the bomb is quite easily done, your goal is to write more clever input strings that will cause the bomb to activate in a variety of ways. These input strings are called exploit strings.

Formatting Exploit Strings

Your exploit strings will typically contain byte values that do not correspond to the ASCII values used for printed characters, and therefore you cannot easily type your exploit strings directly. Instead, you'll need to generate raw (i.e., binary) strings consisting of arbitrary byte values. The included program hextoraw will allow you easily generate raw strings by converting a hex-formatted string to a raw string. The hextoraw program expects input on stdin (i.e., the terminal) unless given the -i file option, in which case it will read input from file.

In a hex-formatted string, each byte value is represented by two hex digits. Byte values are separated by spaces. For example, the string "012345" could be entered in hex format as 30 31 32 33 34 35 (remembering that the ASCII code for decimal digit N is 0x3N). Run man ascii for a full table of ASCII values. Non-hex digit characters are ignored, including the blanks in the example shown.

You should use the existing text files 1-hiss.txt through 5-implodebig.txt to store your (human-readable) exploit strings for each phase. If your exploit string is contained in a file called exploit.txt, you can easily pass it to your bytebomb via hextoraw using Unix pipes (a standard way of passing output from one program to the input of another) like so:

$ cat exploit.txt | ./hextoraw | ./bytebomb

The above command outputs the contents of exploit.txt using the cat command, then passes that output to the program hextoraw as standard input, then takes the (raw, non-human readable) output of hextoraw and passes it to bytebomb.

Alternately, you can store the raw string bytes in a file using I/O redirection, then redirect them back to the bomb, as follows:

$ ./hextoraw -i exploit.txt > exploit.bytes
$ ./bytebomb < exploit.bytes

The above will create a new file called exploit.bytes containing the raw (not human-readable) version of the exploit string in exploit.txt, and then give that as input to the bytebomb. Using this method, you can also easily pass the raw bytes to the program from within gdb:

$ gdb ./bytebomb
(gdb) run < exploit.bytes

One important note is that your exploit string must not contain byte value 0x0A at any intermediate position, since this is the ASCII value for a newline ('\n'). When Gets() encounters this byte, it will assume that you intended to terminate the string input. However, hextoraw will warn you if it encounters this byte value.

To summarize the above: you will write your exploit strings as hex-formatted strings, which will be passed to hextoraw before being given to the bombs themselves.

Generating Byte Codes

You may wish to return to this section after reading the exploit section below.

Your exploit strings will often want to include the actual encodings of assembly instructions (i.e., byte codes). You can find these actual encodings by hand-writing assembly instructions, using gcc as an assembler, and objdump as a disassembler. For example, suppose you write a file example.s containing the following assembly code (.s is the standard suffix for an assembly code file):

# Example of hand-generated assembly code
pushq   $0xabcdef     # Push value onto stack
addq    $17,%rax      # Add 17 to %rax
movl    %eax,%edx     # Copy lower 32 bits to %edx

You can now assemble and then disassemble this file:

$ gcc -c example.s
$ objdump -d example.o > example.d

The generated file example.d now contains the byte encodings of the instructions in example.s; in particular, the following lines of interest:

0: 68 ef cd ab 00     pushq $0xabcdef
5: 48 83 c0 11        add $0x11,%rax
9: 89 c2              %eax,%edx

Each line shows the byte values that encode a single assembly language instruction. The number of the left indicates the starting address (starting with 0) while the hex digits after the colon indicate the byte codes for the instruction. Thus, we can see that the instruction push $0xABCDEF has hex-formatted byte code 68 ef cd ab 00.

Remember that endinaness matters! For example, note that the value 0xABCDEF is specified in reverse byte order starting at byte address 1 above, since we're running on a little-endian machine.

From the above, we can read off the entire byte sequence for the code:

68 ef cd ab 00 48 83 c0 11 89 c2

This hex-formatted exploit string can then be passed through hextoraw to generate a raw input string for the bytebomb. Alternately (and perhaps preferably) you can simply edit the example.d file to omit extraneous characters and to contain C-style comments for readability, yielding:

   68 ef cd ab 00   /* pushq  $0xabcdef  */
   48 83 c0 11      /* add    $0x11,%rax */
   89 c2            /* mov    %eax,%edx  */

However, remember that you should store your human-readable exploit strings in the existing .txt files located in your directory for that purpose. The hextoraw program will ignore C-style comments like those in the example above, and thus you should feel free to (and should) comment your exploit string files.

Exploits

There are five exploits to tackle to fully unlock the capabilities of your bytebombs. The first three exploits involve attacking the bytebomb program, while the last two exploits involve attacking the bigbytebomb program. The goal of each exploit is to 'repurpose' the bomb to execute something that it should not normally execute.

However, the methods you will use to accomplish each exploit will vary. In particular, the first three exploits require you to use code-injection attacks, while the last two exploits require you to use return-oriented programming. Each of the five exploits are described in more detail below.

Part I: Code Injection Attacks

The first three exploits will have you exploiting the bytebomb executable using code injection attacks. Recall that the basic idea in a code injection attack is to use a vulnerable function to inject a series of commands onto the stack, then overwrite the function's return address with the address of the injected commands. Then, when the function executes the ret instruction, rather than returning to the calling function, the program will jump to the injected commands (or whatever address you overwrote the return address with).

Exploit 1: Hiss

The first exploit simply requires you to get the bytebomb to execute the hiss function, which is defined as follows:

void hiss() {
  vlevel = 1;       /* Part of validation protocol */
  printf("The bytebomb hisses loudly!\n");
  validate(1);
  exit(0);
}

To do so, you will need to smash the stack and change the return address of getbuf. All the information you need to devise your exploit string for this stage can be determined by examining a disassembled version of bytebomb using objdump. However, you may want to use gdb to step through the last few instructions of getbuf to make sure it is doing the right thing. Remember to be careful about byte ordering (i.e., endianness) as well as the placement of buf within the stack frame.

If you are having trouble starting, review the material and slides from class on buffer overflows. Note that you don't actually need to inject your own code for this exploit (though you still need to smash the stack).

Exploit 2: Glow

Exploit 2 involves injecting code as part of your exploit string. Here, the objective is to make the bytebomb glow via the glow function, defined as follows:

void glow(unsigned val) {
  vlevel = 2;       /* Part of validation protocol */
  if (val == bombcode) {
    printf("The bytebomb glows brightly!\n");
    validate(2);
  } else {
    printf("The bytebomb flickers faintly. (0x%.8x)\n", val);
    fail(2);
  }
  exit(0);
}

You will note that this is not quite as simple as hiss, since now you need the argument to appear as if you have passed a particular value. Hmm...I wonder what that id.txt file is for...

Some specific advice for this stage:

If you haven't yet carefully read the "Generating Byte Codes" section above, you're definitely going to need to here.
Don't try to use either jmp or call instructions - these instructions use PC-relative addressing, which is quite difficult to set up correctly in injected code. Instead, use ret for all transfers of control, even when you are not returning from a call.
When writing assembly code, be careful of the difference between, e.g., $0x3 (the value 3) and 0x3 (the value at memory address 3). It's easy to write the latter when you actually mean the former.
If the argument val does not have the correct value, the output of the bytebomb will print out the (incorrect) value.
You may be tempted to write an assembly instruction like the following to put a 64-bit value on the stack: movq 0x1122334455667788, (%rsp). Unfortunately, this is not a valid x86-64 instruction (and will not assemble) due to an obscure x86-64 restriction that you can't load a 64-bit immediate data value into memory in one instruction. While you could instead accomplish this task in two instructions by first loading to a register and then copying to memory, remember if if you're just trying to get data onto the stack, you can just put your data onto the stack directly via your exploit string.

Exploit 3: Implode

Exploit 3 is another code injection attack to the implode function, but here the argument is a string and there is a helper function involved:

/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char* sval) {
  char cbuf[110];
  /* Make position of check string unpredictable */
  char* s = cbuf + random() % 100;
  sprintf(s, "%.8x", val);
  return strncmp(sval, s, 9) == 0;
}

void implode(char* sval) {
  vlevel = 3;       /* Part of validation protocol */
  if (hexmatch(bombcode, sval)) {
    printf("BOOM!!! The bytebomb implodes!\n");
    validate(3);
  } else {
    printf("The bytebomb smokes slightly. (\"%s\")\n", sval);
    fail(3);
  }
  exit(0);
}

While the idea is the same as in exploit 2, this one is quite a bit trickier. For one, you will need to include a string representation of your bomb ID in your exploit string. The string should consist of the eight hex digits (ordered from most to least significant) without a leading "0x". Remember that a string in C is terminated by a null character (i.e., byte value 0).

Most significant, however, is that hexmatch and strncmp will push data onto the stack when they are called, potentially overwriting portions of memory that held the buffer used by getbuf. As a result, you will need to be careful where you place the string representation of your bomb ID.

Part II: Return-Oriented Programming

Exploits 4 and 5 require you to attack the bigbytebomb executable. While the source code of the bigbytebomb is nearly identical to that of the bytebomb, code-injection attacks against the bigbytebomb are more difficult due to the usage of two techniques to thwart such attacks:

Stack randomization is used so that the address of the stack is randomly determined each time you run the program. This makes it difficult to determine where your injected code will be located.
The memory segment holding the stack is marked as nonexecutable, so even if you were able to locate the address of your injected code, the program would fail with a segmentation fault if you tried to execute it.

However, despite the above protections, the bigbytebomb is still vulnerable to return-oriented programming (ROP) attacks. Recall that the key idea in ROP is to identify byte sequences within the existing program consisting of one or more instructions followed by the ret instruction. Such sections of code are called gadgets. By smashing the stack using gadget addresses and possibly other data, you can construct a chain of gadgets that implements an exploit.

Locating series of bytes in the program that encode useful instructions (i.e., locating useful gadgets) is tricky. Luckily, we've located the source code to a series of functions within the bigbytebomb that might contain useful gadgets. This set of functions is called the gadget farm and is contained within the farm.c source file. For exploits 4 and 5, you will need to identify useful gadgets from within the gadget farm to perform attacks similar to those in exploits 2 and 3.

To help you in locating gadgets, we have compiled a handout detailing the encodings of useful instructions. More details are provided below.

Exploit 4: GlowBig

Exploit 4 requires you to repeat the Glow exploit against the bigbytebomb (note that if you try your working Exploit 2 against the bigbytebomb, it will fail due to the security measures mentioned above). You can construct your ROP exploit using gadgets only touching the first eight x86-64 registers (%rax-%rdi) and including only the following instruction types:

movq : encodings are given in the instruction encoding handout.
popq : encodings are given in the instruction encoding handout.
ret : encoded by the single byte 0xC3.
nop : encoded by the single byte 0x90.

Note that the nop instruction (pronounced "no op", short for "no operation") is an instruction whose only effect is to increment the program counter (%rip) by 1. In effect, this instruction can be used to provide 'padding' in an instruction byte sequence.

Some specific advice for this stage:

You should only use gadgets drawn from within the gadget farm (not from elsewhere in the program). You can perform this exploit using only gadgets between the start_farm and mid_farm instruction addresses.
This attack is possible using only two gadgets.
Keep in mind that the popq instruction pops data from the stack. This instruction gives you an easy way to inject data (but not code) into the program. In doing so, note that your exploit string will contain a combination of gadget addresses and data.
Don't compile farm.c. Remember that the functions in farm.c are part of the bigbytebomb executable -- if you compile farm.c as a standalone program, the gadget addresses you will get will not match what is in the bigbytebomb.

Exploit 5: ImplodeBig

Before you tackle Exploit 5, consider what you have accomplished so far. In Exploits 2 and 3, you caused a program to execute machine code of your own design. If bytebomb had been a network server, you could have injected your own code into a remote machine. In Exploit 4, you circumvented two of the primary ways modern systems use to thwart buffer overflow attacks. Although you did not inject your own code, you were able to hijack the operation of the program using pieces of existing code.

Exploit 5 requires you to perform the Implode attack (Exploit 3) on the bigbytebomb. This is substantially more difficult than the Glow attack of the previous stage. In light of this difficulty, Exploit 5 is only worth 5% of your lab grade. Think of the last exploit more like an extra credit problem for those of you looking for a challenge rather than a necessary component of the lab in order to get a good score.

Some specific advice for this stage:

You should use gadgets drawn from within the complete gadget farm (between start_farm and end_farm).
Remember the effect that movl has on the upper 4 bytes of the destination register.
In addition to the standard nop instruction, you can also use other instructions as functional nops, which are 2-byte instructions that do not change any registers or memory values. Useful functional nop instructions are given in the encoding handout.
Remember that the simplest type of gadget is simply calling an existing function (which potentially allows for more instructions than just those specified in the encoding handout).
My reference solution requires eight (not neccessarily unique) gadgets.

Logistics

You are responsible for two tasks:

Determining working exploit strings and storing them in the five provided textfiles (1-hiss.txt through 5-implodebig.txt). Remember that these files should contain the human-readable exploit strings, not the raw strings outputted by hextoraw.
Documenting your methods and insights in notebook.txt, again organized by exploit (but everything can be stored in this one single file). To supplement documentation here, you should write C-style comments using /* and */ in your exploit files (note: you cannot use // comments, and must have a space after the opening /* and before the closing */). You should also use newlines to break up your exploit strings into logical sections.
To reiterate the above: don't just write one huge line containing your whole exploit string!

Your final submission will consist of your committed files at the time of the due date.

Evaluation

You will be evaluated both on determining working exploits for each stage as well as clearly documenting your methods and insights in notebook.txt. Total points for each exploit are listed below:

Hiss: 20 points
Glow: 25 points
Implode: 20 points
GlowBig: 30 points
ImplodeBig: 5 points

Partial credit is possible for clear documentation that demonstrates some understanding of the exploit even if the full exploit is not working.

Byte Stacker Status Report

The Disarmament Status Report page has been updated to show progress towards figuring out the bytebombs. You can view your progress towards completing the exploits on this page.

Exploit notifications are automatically logged by the server and require no action on your part.