Lab 5 - Make Some Cache

Assigned:	Friday, April 23
Groups Due:	Monday, April 26, 11:59 pm
Due Date:	Tuesday, May 4, 11:59 pm
Collaboration Policy:	Level 1
Group Policy:	Pair-optional (you may work in a group of 2 if you wish)

In this lab, you will write a C program simulating the behavior of a hardware cache on real-world memory usage traces. Writing and testing your simulator will help you understand the different types of caches designs and the impact that cache memories can have on the performance of your programs.

Lab Overview

Your cache simulator will simulate an arbitrary hardware cache (also called a 'cache memory'), as defined by the usual three values discussed in class: S (the number of cache sets), E (the number of cache lines per set), and B (the number of bytes in each data block). Your program will simulate the behavior of the specified cache on a trace file, which consists of a series of memory accesses that your program will replay in simulation. The output of your simulator will be three values: the number of cache hits, the number of cache misses, and the number of block evictions performed.

Note that your simulator will not actually cache or otherwise store any real data from memory; instead, you will simply be replaying the series of memory accesses and tallying which accesses would result in hits/misses/evictions if the simulated cache were an actual hardware cache.

To help you test your simulator, you are provided with a reference simulator implementation as well as a driver program that will automatically test your simulator by comparing its results against the reference simulator. Details on the trace files and the reference simulator are provided below.

Memory Trace Files

The simulator operates on memory trace files as input, which are files generated by valgrind that describe a series of memory accesses. Below is an example of a trace file that specifies a sequence of four memory operations (one per line):

I 0400d7d4,8
 M 0421c7f0,4
 L 04f6b868,8
 S 7ff0005c8,8

The general format of each line of a trace file is as follows:

[space][operation] [address],[size]

The space field is either a space or nothing, depending on the operation type (as detailed below). The operation field indicates the type of memory operation, which is one of the following:

"I" denotes an instruction load (reading an assembly instruction from memory).
"L" denotes a data load (reading a data value from memory).
"S" denotes a data store (writing a data value to memory).
"M" denotes a data modify (a data load followed by a data store).

There is never a space before an "I" operation, while there is always a space before each "M", "L", or "S" operation.

The address field specifies a 64-bit memory address written in hex. Finally, the size field specifies the number of bytes accessed by the operation.

For example, in the first line of the example trace file above, there is no leading space (since the operation is an instruction load), and the operation is reading an 8-byte value (instruction) located at memory address 0x0400d7d4.

Reference Trace Files

You are provided with a set of preexisting trace files within the traces subdirectory of your lab repository. You can use these reference trace files to test your cache simulator.

In addition to the included traces, you can use valgrind to generate your own memory traces in this format, like so:

$ valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes [some-cmd]

The above example will run the program [some-cmd] and dump a trace of its memory accesses to the terminal. To save the output in a file, just redirect to a file by appending something like > mynewtrace.trace to the end of the valgrind command. The actual command [some-cmd] could be any program you like with any arguments, e.g., you could use ls -l to get a memory trace from the ls program.

Simulator Interface

The simulator program takes the following command-line arguments:

-h: Flag that prints usage info
-v: Flag that turns on extra (verbose) output
-s <s>: Number of set index bits
-E <E>: Associativity (number of lines per set)
-b <b>: Number of block offset bits
-t <tracefile>: Name of the trace file to replay

When run, the simulator replays the memory operations listed in the specified trace file against the specified cache (using the cache values specified by the command line arguments). On completion, the simulator outputs a single line, which specifies the number of (simulated) cache hits, misses, and evictions.

For example, here is a sample run of the reference simulator (with arguments specified in any order):

$ ./cachesim-ref -s 4 -E 1 -b 4 -t traces/t2.trace
hits:4 misses:5 evictions:3

Adding the verbose flag -v will print extra information about each memory access in the trace, such as shown below:

$ ./cachesim-ref -s 4 -E 1 -b 4 -t traces/t2.trace -v
L 10,1 miss 
M 20,1 miss hit 
L 22,1 hit 
S 18,1 hit 
L 110,1 miss eviction 
L 210,1 miss eviction 
M 12,1 miss eviction hit 
hits:4 misses:5 evictions:3

You are provided with a skeleton implementation of your own simulator that provides the same interface and output format as specified above (minus the additional verbose output). However, the skeleton implementation does not actually do any cache simulation, and simply outputs 0 for all three cache statistics. Your job will be to complete your own cache simulator so that it produces the same results as the reference simulator on any given trace file for a particular cache configuration.

Cache Specification

In order to match the reference simulator, you must adhere to the following specifications while designing your cache simulator. Follow each of these instructions carefully, as each one of them has the potential to completely change your cache's behavior if not followed.

We are only interested in data cache performance, so your simulator should ignore all instruction load operations (i.e., lines starting with "I").
You must use a LRU (least-recently-used) replacement policy when evicting blocks from the cache.
Your simulator must work correctly for arbitrary s, E, and b. This means that you will need to allocate storage for your data structures using malloc.
You may assume that memory accesses are aligned properly such that a single memory access never crosses block boundaries. As a result, you can ignore the request sizes in the valgrind traces.
You do not need to maintain any dirty bit information in your (simulated) cache, as for the purposes of your simulation, there is nothing extra to do when evicting a clean block versus a dirty block.

Lab Files

Your lab files contained in your repository consist of the following:

cachesim.c: Your cache simulator program. This is the only file you should modify.
cachesim-ref: The executable reference simulator.
Makefile: Used to build your program.
traces/: Directory of reference trace files for testing.
test-cachesim: Executable program to automatically test your simulator against the reference simulator.

To test your cache simulator against all of the reference trace files and output an auto-generated correctness score, compile your simulator by running make and then execute the test-cachesim program. With a complete and correct cache simulator, this will result in the following output:

$ ./test-cachesim 
                        Your simulator     Reference simulator
Points (s,E,b)    Hits  Misses  Evicts    Hits  Misses  Evicts
     3 (1,1,1)       9       8       6       9       8       6  traces/t1.trace
     3 (4,2,4)       4       5       2       4       5       2  traces/t2.trace
     3 (2,1,4)       2       3       1       2       3       1  traces/t3.trace
     3 (2,1,3)     167      71      67     167      71      67  traces/t4.trace
     3 (2,2,3)     201      37      29     201      37      29  traces/t4.trace
     3 (2,4,3)     212      26      10     212      26      10  traces/t4.trace
     3 (5,1,5)     231       7       0     231       7       0  traces/t4.trace
     6 (5,1,5)  265189   21775   21743  265189   21775   21743  traces/t5.trace
    27

Simulator summary: scored 27 of 27 points

Note that your simulator may be tested on traces not in the set of reference traces, and thus a full score on test-cachesim does not necessarily mean that your program will receive full correctness marks. However, scoring less than full marks on test-cachesim is an immediate indicator that your simulator is not yet finished.

Implementation Advice

Here are some general and specific tips for working on your cache simulator.

Basic Approach

The basic outline of your simulator should be (1) create your (initially empty) cache data structure, and then (2) replay the operations specified in the trace file against your cache, updating your cache appropriately as you go and logging hits, misses, and evictions. Remember that your cache does not actually store any data, so you are really just tracking cache metadata without any actual data blocks involved.

The primary "new" feature of C that you'll need in this lab is defining and using struct types. Refer to the x86 structure slides for a refresher on the basics of using structs. In the specific context of this lab, start by deciding what metadata you'll need to maintain for each cache line and define a struct for that purpose (i.e., a struct representing a single cache line). You may then wish to do the same thing for a single cache set and/or the cache itself. Remember that struct types can contain other struct types as fields.

Storing Memory Addresses

The appropriate type to represent a memory address read from a trace file (i.e., an unsigned 64-bit value) is unsigned long long, since a regular long is often just a 32-bit value (implementation dependent). You may wish to use a typedef to avoid repeatedly typing this type name. Remember that a typedef is simply a way to give some existing type an alias. For example, if you wanted to define a new type alias memaddr_t, you could use a typedef as follows:

typedef unsigned long long memaddr_t;

Starter Code

The starter code in cachesim.c includes a number of predefined global variables as well as several helper functions. You should not modify the included helper functions, but you will need to add your own to avoid writing the entire simulator inside main. You may also wish to add additional global variables (but as always, only variables that actually need to be global should be declared as such).

warning Warning: If you write any helper functions that return pointers, watch out for this general pitfall of C programs: never return the address of a local variable from a function, since after the function returns, the address will now point to stack memory in the old (deallocated) stack frame. For example, the following is valid C code but is unsafe:

int* foo() {
  int x = 5;
  return &x; // danger - address of local var!
}

Parsing Trace Files

This lab may be the first time you have read any files in C, so you may not be familiar with the standard functions for doing so. Files can be opened using the fopen function and closed using the fclose function. Once opened, the easiest way to read a single line from an opened file is using the fgets function.

Another useful function for parsing out the fields contained within a line is sscanf, which works like scanf but reads input from a given string rather than from user input. Here is a sscanf tutorial. Note that if you are trying to extract a memory address in the trace file format into an unsigned 64-bit number (as suggested above), then you should use the format specifier %llx in your sscanf format string (this specifier says to read a 64-bit unsigned value given in hex format).

Also remember that valgrind does not put a space in front of "I" lines as it does for "M", "L", and "S", which should be helpful in identifying (and ignoring) these lines.

Memory Operations

Each data load (L) or store (S) operation can cause at most one cache miss. However, the data modify operation (M) is treated as a load followed by a store to the same address. Thus, an M operation can result in two cache hits, or a miss and a hit plus a possible eviction.

Debugging

The verbose flag -v provides an easy way to build in additional debugging output to your program without breaking the mandated output format of the program (which is just the single output line at the very end). Your program can produce as much (or as little) verbose output as desired, and in whatever format you wish. A basic example of producing verbose output is given in the starter code of cachesim.c. While you are not required to implement any particular verbose output (e.g., in the style of the reference simulator), adding similar output will likely make debugging your simulator much easier.

Do your initial debugging on the small traces (particularly t2.trace and t3.trace). These traces are smaller than t1.trace and will likely be easier to debug.

To run the simulator within GDB, launch the program within gdb, then pass the command line arguments when starting the program using run, e.g.:

$ gdb cachesim
(gdb) run -s 4 -E 1 -b 4 -t traces/t2.trace

As in Lab 2, you should run valgrind on your cache simulator to check for many kinds of memory errors.

Command-Line Arguments

The starter code uses the standard C function getopt to help in parsing command-line arguments, which is much easier (and more flexible) than trying to manually parse arguments. While you do not need to write this part of the code yourself, it is a good idea to have a basic idea of what is going on (both to understand the starter code and for general C knowledge, since this is the standard way that most C programs handle command-line arguments).

The basic idea is that getopt is given a string that specifies all of the possible command-line arguments, some of which may take associated values (e.g., the -t and -s arguments), and some of which may just be boolean flags (e.g., the -h and -v flags). The string passed to getopt specifies the former type of argument by including a colon : after the associated character (e.g., t:). The specific usage of getopt in cachesim.c (i.e., a switch statement wrapped in a loop) is highly idiomatic and should be mostly self-explanatory. The only other detail that may not be apparent is that optarg is a global variable set by getopt itself to be the argument value associated with the argument currently being parsed.

Logistics

As usual, you can accept the lab repository on GitHub via the link on Blackboard and then clone the repository to the class server to begin working. You are responsible for completing cachesim.c, but should not create or modify any other file.

If you are working in a group and did not previously do so during Lab 2, it is a good idea to go through Part 3 of the Git tutorial, which covers some specific topics applicable to collaboration (most significant of which is handling merge conflicts). You should also review the course policies on group work.

Your final submission will consist of your committed and pushed cachesim.c file at the time of the due date. Remember to submit your individual group reports to me if you worked in a group.

Evaluation

Your simulator will be graded on program correctness, design, and style. Remember that the test-cachesim program will check your simulator for correctness only; nothing else!

You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions about what constitutes good program design and/or style that are not covered by the guide.