Assigned: | Tuesday, April 10 |
Due Date: | Monday, May 7, 11:59 pm |
Collaboration Policy: | Level 1 |
Group Policy: | Pair-optional (but recommended!) |
This project will help you understand paging, address spaces, and virtual memory management. In particular, you will implement an external pager, which is a process that handles virtual memory requests for application processes. This program will be analogous to the virtual memory portion of a normal operating system.
Your external pager ("pager" for short) will handle address space creation and destruction, read and write faults, and simple argument passing between address spaces (i.e., virtual and physical). The pager will manage a fixed range of the virtual address space (called the arena) of each application that uses it. While running, the pager will handle all access requests to pages located in the arena (serially, within a single thread). Valid pages in the arena will be stored in (simulated) physical memory or in (simulated) disk. Your pager will manage these two resources on behalf of all applications using the pager.
In addition to handling page faults, your pager will provide two system calls to applications:
vm_extend
and vm_syslog
. An application uses vm_extend
to ask the pager to
make another virtual page of its arena valid. You can think of vm_extend
like a very low-level memory allocation routine, on top of which a higher-level library like malloc
could be built (though your test applications will be calling vm_extend
directly). An application uses vm_syslog
to
ask the pager to print a message in memory to its console (which may seem trivial, but is actually somewhat tricky)!
Your external pager works in tandem with the CPU's memory management unit (MMU) and exception-handling mechanism. The hardware MMU is invoked on every virtual memory access and performs the following tasks:
System call instructions also invoke the exception mechanism. When a system call instruction is executed, the exception mechanism transfers control to the registered kernel handler for the exception.
The MMU and exception functionality in this project are emulated
through a provided software infrastructure. To use this infrastructure, each application
that uses the external pager must include vm_app.h
and link with
libvm_app.a
, while your external pager itself must include vm_pager.h
and link with
libvm_pager.a
. You do not need to understand the mechanisms used to emulate
the hardware components (but in case you're curious, the infrastructure uses mmap
,
mprotect
, signal handlers, named pipes, and remote procedure calls).
Linking with these libraries enables application processes to communicate with the pager process in the same manner as applications on real hardware communicate with the operating system. Specifically, applications issue load and store instructions (i.e., reads and writes compiled from normal variable accesses), and these instructions are translated or faulted by the infrastructure exactly as in the above description of the MMU. For faulting instructions and system calls, the infrastructure transfers control to the external pager via function calls.
The following diagram shows how your pager will interact with applications that
use the pager. An application makes a request to the system via the function
calls vm_extend
, vm_syslog
and vm_yield
, or by trying to load or store an
address that is non-resident or protected.
initialize VM system --> vm_init create process --> vm_create end process --> vm_destroy switches to new process --> vm_switch vm_yield --> may switch to new process vm_extend --> system call handler --> vm_extend vm_syslog --> system call handler --> vm_syslog faulting load --> exception handler --> vm_fault faulting store --> exception handler --> vm_fault +-----------+ +----------------+ +----------------+ |APPLICATION| | INFRASTRUCTURE | | EXTERNAL PAGER | +-----------+ +----------------+ +----------------+
Note that there are two versions of vm_extend
and vm_syslog
: one for
applications and one for the pager. The application-side vm_extend
/vm_syslog
is implemented in libvm_app.a
and is called by the application process. The
pager-side vm_extend
/vm_syslog
is implemented by you in your pager. Think of
the vm_extend
/vm_syslog
in libvm_app.a
as a system call wrapper, and think of
the vm_extend
/vm_syslog
in your pager as the code that is invoked by the
system call. When the application calls its vm_extend
/vm_syslog
(the one in
libvm_app.a
), the infrastructure takes care of invoking the system call
vm_extend
/vm_syslog
in your pager. See the header files vm_app.h
and
vm_pager.h
for the actual function declarations.
As a matter of interest, the vm_extend
function is roughly equivalent to the real-world
sbrk
system call in Linux (which is used by malloc
to enlarge the size of the heap).
A virtual address is composed of a virtual page number and a page offset, as follows:
bit 63-13 bit 12-0 +----------------------+-------------------+ | virtual page number | page offset | +----------------------+-------------------+
The simulated MMU uses a single-level, fixed-size page table.
The page table is an array of page table entries (PTEs), one
PTE per virtual page in the arena. The MMU locates the active page table through the
page table base register (PTBR). In this case, the PTBR is a global variable
that is declared and defined by the infrastructure, but will be controlled by
your pager. The following portion of vm_pager.h
details the arena,
page table, PTEs, and PTBR.
/* * *********************** * * Definition of arena * * *********************** */ /* page size (in bytes) for the machine */ #define VM_PAGESIZE 8192 /* virtual address at which application's arena starts */ #define VM_ARENA_BASEADDR ((void*) 0x60000000) /* virtual page number at which application's arena starts */ #define VM_ARENA_BASEPAGE ((uintptr_t) VM_ARENA_BASEADDR / VM_PAGESIZE) /* size (in bytes) of arena */ #define VM_ARENA_SIZE 0x20000000 /* * ************************************** * * Definition of page table structure * * ************************************** */ /* * Format of page table entry. * * read_enable=0 ==> loads to this virtual page will fault * write_enable=0 ==> stores to this virtual page will fault * ppage refers to the physical page for this virtual page (unused if * both read_enable and write_enable are 0) */ typedef struct { unsigned long ppage : 51; /* bits 0-50 */ unsigned int read_enable : 1; /* bit 51 */ unsigned int write_enable : 1; /* bit 52 */ } page_table_entry_t; /* * Format of page table. Entries start at virtual page VM_ARENA_BASEPAGE, * i.e. ptes[0] is the page table entry for virtual page VM_ARENA_BASEPAGE. */ typedef struct { page_table_entry_t ptes[VM_ARENA_SIZE / VM_PAGESIZE]; } page_table_t; /* * MMU's page table base register. This variable is defined by the * infrastructure, but it is controlled completely by the student's pager code. */ extern page_table_t* page_table_base_register;
Make sure you understand the behavior of the fields of
the page table entries. Whenever a page is accessed by the MMU, it is accessed
either as a read/load or a write/store. The two protection bits
read_enable
and write_enable
determine whether reads
or writes (respectively) will call vm_fault
. Importantly, note that a
non-resident page access (i.e., a regular page fault) is not the only time
you might need an access to fault. One or both protection bits might be set to 0
even in the case of a resident page. Clearly, a non-resident page would always need
both protection bits to be set to 0.
In the case of a non-faulting access, the MMU will access the ppage
field (the frame/physical page number) and then access the requested memory
address. For faulting accesses, the MMU will automatically retry the access
after handling the fault via vm_fault
.
A physical page may be associated with at most one virtual page at any given time (i.e., no sharing is allowed).
Applications use three system calls to communicate explicitly with the
simulated operating system: vm_extend
, vm_syslog
,
and vm_yield
. The prototypes for these system calls are given
in vm_app.h
:
/* * vm_extend * * Ask for the lowest invalid virtual page in the process's arena to * be declared valid. Returns the lowest-numbered byte of the newly * valid virtual page. For example, if the valid part of the arena * before calling vm_extend is 0x60000000-0x60003FFF, the return value * will be 0x60004000, and the resulting valid part of the arena will * be 0x60000000-0x60005FFF. The newly-allocated page is initialized to * all zero bytes. Returns NULL if the new page cannot be allocated. */ extern void* vm_extend(); /* * vm_syslog * * Ask external pager to log a message of length len. Message data * must be in the part of the address space controlled by the pager. * Returns 0 on success or -1 on failure. */ extern int vm_syslog(void* message, unsigned len); /* * vm_yield * * Ask operating system to yield the CPU to another process. The * infrastructure's scheduler is non-preemptive, so a process runs * until it calls vm_yield or exits. */ extern void vm_yield();
The following is a sample application program that uses the external pager:
#include "vm_app.h" int main() { char* p; p = (char*) vm_extend(); p[0] = 'h'; p[1] = 'e'; p[2] = 'l'; p[3] = 'l'; p[4] = 'o'; vm_syslog(p, 5); }
This application allocates one virtual page within the arena, writes five bytes to it, then asks the pager to log
the five bytes just written. Note that since p
is an address returned by vm_extend
, it
must correspond to an address within the arena.
The functions that you must implement in your pager are declared in vm_pager.h
. These declarations are shown below.
Note you will not implement a main
function; instead, main
is included in libvm_pager.a
.
The infrastructure will invoke your pager functions as described previously.
/* * vm_init * * Initializes the pager and any associated data structures. Called automatically * on pager startup. Passed the number of physical memory pages and the number of * disk blocks in the raw disk. */ extern void vm_init(unsigned memory_pages, unsigned disk_blocks); /* * vm_create * * Notifies the pager that a new process with the given pid has been created. * The new process will only run when it's switched to via vm_switch. */ extern void vm_create(pid_t pid); /* * vm_switch * * Notifies the pager that the kernel is switching to a new process with the * given pid. */ extern void vm_switch(pid_t pid); /* * vm_fault * * Handle a fault that occurred at the given virtual address. The write flag * is 1 if the faulting access was a write or 0 if the faulting access was a * read. Returns -1 if the faulting address corresponds to an invalid page * or 0 otherwise. */ extern int vm_fault(void* addr, bool write_flag); /* * vm_destroy * * Notifies the pager that the current process has exited and should be * deallocated. */ extern void vm_destroy(); /* * vm_extend * * Declare as valid the lowest invalid virtual page in the current process's * arena. Returns the lowest-numbered byte of the newly valid virtual page. * For example, if the valid part of the arena before calling vm_extend is * 0x60000000-0x60003FFF, vm_extend will return 0x60004000 and the resulting * valid part of the arena will be 0x60000000-0x60005FFF. The newly-allocated * page is allocated a disk block in swap space and should present a zero-filled * view to the application. Returns NULL if the new page cannot be allocated. */ extern void* vm_extend(); /* * vm_syslog * * Log (i.e., print) a message in the arena at the given address with the * given nonzero length. Returns 0 on success or -1 if the specified * message is invalid. */ extern int vm_syslog(void* message, unsigned len);
More details on the proper behavior of vm_fault
and vm_syslog
are provided below.
If a fault occurs on a virtual page that is not resident, you must find a physical page (frame) to associate with the virtual page. If there are no free physical pages, you must create a free physical page by evicting a virtual page that is currently resident.
The pager must use the second-chance (clock) algorithm to select a victim. The clock queue is an ordered list of all valid, resident virtual pages in the system. To select a victim, check the next physical page in the queue. If it has been accessed in any way since it was last, continue searching to the next page in the queue (and clear its reference bit).
If the next physical page in the queue has not been accessed, then its virtual page should be evicted. Dirty and clean pages are treated the same when selecting a victim page to evict (i.e., don't continue searching past a dirty eviction candidate in order to locate a clean candidate). Additionally, you should not write out a dirty page to disk unless you're actually evicting that page.
Note that the order of pages in the clock queue may differ from the order of their physical page numbers.
The vm_syslog
routine is called with a pointer to an array of bytes
in the current process's virtual address space and the length of that array. The pager
should first check that the entire message is in valid pages of the arena.
Return -1 (and don't print anything) if any part of the message is not on a
valid arena page, or if the message length is zero.
After checking the message validity, the pager should copy the entire message
into a C++ string in the pager's address space, then print the C++ string
to cout
.
You must use exactly the following formatting for your print statement (this assumes
your C++ string is named s
):
cout << "syslog \t\t\t" << s << endl;
You must treat access to the message by vm_syslog
exactly the same
as if the application had accessed the message itself (e.g., for purposes of
residency, reference, and so forth), starting from the lowest virtual address
and proceeding towards the highest virtual address.
Although the pager infrastructure prints some additional output itself,
the print statement in vm_syslog
should be the only output generated by
the pager itself. You can disable the infrastructure output during testing
by passing the -q
flag when running the pager.
There are many points in this project where you have some freedom over when zero-fills, faults, and disk I/O happen. You must defer such work as far into the future as possible (or even better, avoid it entirely). Doing so to the maximum extent possible is one of the trickier parts of the project!
As a simple example of work deferral, if a page that is being evicted does not need to be written to disk, don't do so. However, make sure that you don't modify the page replacement algorithm (or any other aspect of the specification) in order to avoid work.
There are cases where you might need to maintain extra state for the purpose of avoiding or deferring work. Carefully look for these cases!
If you could possibly defer or avoid some action at the expense of making another action necessary, keep in mind the relative costs of various operations. Incurring a fault (about 5 microseconds on current hardware) is cheaper than zero-filling a page (about 30 microseconds), which is in turn much cheaper than a disk I/O (about 10,000 microseconds). For instance, if you have a choice between taking an extra fault and causing an extra disk I/O, you should prefer to take the extra fault.
This section describes how the external pager accesses simulated hardware, i.e. physical memory, disk, and MMU.
Physical memory is structured as a contiguous collection of N pages, numbered
from 0 to N-1. The number of physical pages is configurable via the -m
command-line argument when executing the pager (e.g. by running ./pager -m 4
).
The minimum number of physical pages is 2, the maximum is 128, and the default is 4.
The disk is modeled as a single device that is a fixed number of "blocks" long, where each disk block is the same size as a physical memory page.
The pager controls the operation of the MMU by modifying the contents of
the page tables and the page_table_base_register
variable.
The following portion of vm_pager.h
describes the variables and utility
functions for accessing the hardware:
/* * ********************************************* * * Public interface for the disk abstraction * * ********************************************* * * Disk blocks are numbered from 0 to (disk_blocks-1), where disk_blocks * is the parameter passed to vm_init. */ /* * disk_read * * Read the specified disk block into the specified physical memory page. */ extern void disk_read(unsigned block, unsigned ppage); /* * disk_write * * Write the contents of the specified physical memory page onto the specified * disk block. */ extern void disk_write(unsigned block, unsigned ppage); /* * ******************************************************** * * Public interface for the physical memory abstraction * * ******************************************************** * * Physical memory pages are numbered from 0 to (memory_pages-1), where * memory_pages is the parameter passed to vm_init. * * The pager accesses the data in physical memory through the variable * pm_physmem, e.g. ((char*) pm_physmem)[5] is byte 5 in physical memory. */ extern void* pm_physmem;
Note that the pager is not responsible for initializing either the
disk or physical memory. The pager is given the number of physical pages
and disk blocks via vm_init
, but it should not try to allocate
actual disk blocks or physical memory space. Instead, it will work with disk_read
,
disk_write
, and pm_physmem
, which are already defined and initialized.
As in the previous project, you will submit a suite of test cases exercising your pager as well as the pager itself. Each test case for the pager will be a short C++ application program that uses the pager via the client interface described previously, and should be run without any arguments and without using any input files.
Each test case must specify the number of physical memory pages to use when
running the pager via the name of the test case file. Specifically, the name
of each test case must be of the format anyName.memoryPages.cc
,
where memoryPages
is the number of physical memory pages. For example,
you might name a test case myTest.4.cc
. Remember that the minimum
number of physical memory pages is 2 and the maximum is 128.
Your test suite may contain up to 20 test cases. Each test case may cause a correct pager to generate at most 256 KB of output and must take less than 60 seconds to run (these limits are much larger than needed). You will submit your suite of test cases together with your pager to the autograder.
You should test your pager with both single and multiple applications running. However, your submitted test suite need only be a single process; none of the buggy pagers used to evaluate your test suite require multi-process applications to be exposed.
Finally, note that the autograder will exercise your test cases to expose
buggy pagers according to the pager (NOT the application) output.
In other words, a test case exposes a buggy pager by causing the buggy pager
to generate output that differs from the correct output. Of course, an
application call to vm_syslog
will directly correspond to
a line of pager output, but the additional infrastructure output provides much more
detail into the behavior of the pager.
General and specific advice on tackling the project is provided below.
One of the first things you should do is to write down a state-based flowchart
(aka finite state machine) for the life of a virtual page, from creation via
vm_extend
to destruction via vm_destroy
. Represent
each virtual page as a series of bits (i.e., the relevant state of a page),
where each state in the flowchart represents a specific setting of bits.
Ask yourself what events can happen to a page at each stage of its
lifecycle and what effect this will have on its state. You will, of course, need to
decide what state (i.e., bits) you need to represent each state. As you
design the state machine, try to identify all of the places where work can
be deferred or avoided. This exercise may seem academic,
but the correctness of your program will critically depend on correctly
designing (and following) the state machine.
One of the key 'mental hurdles' in this project is understanding the full role
of vm_fault
within the pager. In particular, do not make the
mistake of thinking that vm_fault
is simply called when a non-resident
page is accessed; while that is one specific case where a fault should occur, it is not
the only case. Remember that faulting is how your pager takes control from an active process
and is given an opportunity to update internal state, perform any needed bookkeeping, etc.
Thus, in any scenario where you need your pager to do so (even if it doesn't involve servicing
a full-blown page fault), you will want to ensure that a fault occurs by setting
the page protection bits accordingly.
Specific tips on each of the pager functions are provided below:
vm_init
when the pager starts. This function should
set up whatever data structures you need to begin accepting vm_create
calls and
subsequent requests from processes.
vm_create
when a new application process starts. You
should initialize whatever data structures you need to handle the new process and
its subsequent calls to the library. The process's initial page table should be empty,
since there are no valid virtual pages in its arena until vm_extend
is called.
Note that the new process will not be running until after it is switched to via
vm_switch
.
vm_switch
whenever the OS scheduler switches to
a new process. This function allows your pager to do whatever bookkeeping is needed
to register the fact that a new process is running.
vm_destroy
when the current application
process exits. This routine must deallocate all pager resources held by that process,
which might include page tables, physical pages, and disk blocks.
vm_extend
when it wants to make another virtual page in its
arena valid. Each new page should be backed by a disk block in swap space, which is used to store
the page when it is not resident in physical memory. This approach is called "eager" swap
allocation, since swap space is allocated up-front rather than when a page needs to be evicted
to disk. Additionally, remember that an application should see each byte of a newly extended virtual
page as initialized with the value 0. However, the actual data initialization
needed to provide this abstraction should be deferred as long as possible (see
the section on work deferral for further details).
vm_fault
routine is called in response to a read or write fault by the
application. Your pager determines which accesses in the arena will generate
faults by setting the read_enable
and write_enable
bits in the page table.
Remember that a faulting instruction is automatically retried after handling the fault via
vm_fault
(assuming 0 is returned); thus, you need to be careful to ensure
that the reattempted instruction will not fault again.
vm_syslog
will be copying the array into the pager's C++
string. Note that this will be the one place where you will be performing virtual to
physical address translation (since this task is normally performed by the MMU). Be careful
to ensure that syslog treats access to the message exactly the same as the application itself.
For diagnostic printing in your pager, don't use cout
, as any extra output
sent to standard out (except for syslog
) will cause your pager to fail
test cases. Instead, print to standard error, i.e., using cerr
instead
of cout
. This will allow you to leave active debugging statements when
submitting to the autograder.
Use assertion statements copiously in your code to check for unexpected conditions generated by bugs in your program. These error checks are essential in debugging complex programs, as they help flag error conditions early.
One way to construct a good test suite is to trace through different transition paths that a page can take through a pager's state machine, then write a short test case that causes a page to take each path.
The starter files for project 4 are available at p4-starter.tar.gz. As in the last project, you can use wget
and tar
to download and unpack the files on the class server.
Write your pager in C++ on Linux in a single file called pager.cc
.
The public functions in vm_pager.h
are
declared extern
, but all other functions and global variables in your pager
should be declared static
to prevent naming conflicts with other libraries.
Your program may use any functions included in the standard C++ library, including (and
especially) the STL. You should not use any libraries other than the standard C++ library.
You can compile the pager itself as follows (note that you need the additional flags -lssl -lcrypto
):
g++ -Wall -std=c++11 -o pager pager.cc libvm_pager.a -lssl -lcrypto
To compile an application program (or test case) named app.cc
that uses
the pager, you can run the following:
g++ -Wall -std=c++11 -o app app.cc libvm_app.a
Note that due to Linux-specific functionality used by the infrastructure, there is no set of Mac libraries available for this project.
To run your pager and an application, first, start the pager. Remember that
you can specify the number of physical memory pages via the -m
flag (the number of disk blocks is non-configurable). The
infrastructure will print a message saying "Pager started with [num] physical
memory pages"
. After the pager starts, you can run one or more application processes
that will interact with the pager via the infrastructure. The same user must run the
pager and the applications that use the pager, and all processes must run on
the same machine.
You can submit your program to the autograder as follows:
submit3310 4 pager.cc test1.4.cc test2.4.cc ...
Autograder note: The autograder contains a mix of 'small' and 'large' pager tests. The large tests take a significant length of time to run (i.e., 30 min or more), and therefore will only be executed if your pager passes at least 75% of the small tests. Additionally, due to the large number of buggy pagers in the autograder, test case autograding may also be lengthy; generation of autograder feedback may take 1-2 hours total. Note that your submissions are timestamped according to when you submit them, not when the feedback email is generated.
If you are working in a group, in addition to your group's final program submission, each group member must individually submit a group report to me by email. Your group report, which will be kept anonymous from your partners, should summarize your contributions to the project as well as those of your partners. Your report does not need to be long (and could be as simple as "we all worked on the entirety of the project together in front of one machine"), but it must be received for your project to be considered submitted.
Group submissions will receive a single grade, but I reserve the right to adjust individual grades up or down from the group grade in the event of a clearly uneven distribution of work.
In addition to your program itself, you will also write a short paper (~3-5 pages) that describes your pager. The purpose of this writeup is to help you gain experience with technical writing. In particular, your paper should include the following:
Upload your writeup as a PDF to Blackboard no later than 48 hours after the due date for the code. You only need to submit one copy of the writeup per team. Typesetting your writeup in LaTeX is encouraged but not required. The quality of your writeup will affect your project grade -- do not neglect it!
Your project will be graded on program correctness, design, and style, as well as the quality of your project writeup. Remember that the autograder will only check the correctness of your program, nothing else!
You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions about what constitutes good program design and/or style that are not covered by the guide.