Assigned: | Friday, November 10. |
Acceptance Deadline: | Monday, November 13, 11:59 pm. |
Proposal Due: | Thursday, November 16, 11:59 pm. |
Checkpoint 1: | Tuesday, November 21, 11:59 pm. |
Checkpoint 2: | Sunday, December 3, 11:59 pm. |
Presentations: | Tuesday, December 5 and Thursday, December 7 (in-class). |
Demos, Due Date: | Saturday, December 16, by 12:00 pm. |
The previous three projects have given you experience with single-tier client/server systems, multi-tier client/server systems, and cluster-based processing. Your fourth and final project will give you an opportunity to design and build a larger distributed system that runs in the wide-area over many machines and supports some degree of fault-tolerance.
In addition to designing and implementing your system, you will also (1) conduct experiments demonstrating your system's effectivess, (2) write a technical paper detailing your system and experiments, (3) give a presentation on your system to the class, and (4) present a final demo (to me) showing how your system works.
Your final project should be done in teams of two or three. All team members are expected to actively participate in all parts of the project (coding, experiments, presentation, paper, etc).
The specification for this project is much more open-ended than for the previous projects, and the type and purpose of your distributed system is left up to you. However, your system to adhere to the following broad guidelines:
Beyond these guidelines, creativity is encouraged! I will consider most project ideas if they are thoughtful and feasible.
The technology and languages(s) you choose to use in your system are up to you. While you are welcome to use any of the technologies that we have already used or discussed in the semester to implement your system (e.g., XML-RPC, Java RMI, sockets, etc.), you are not restricted to any particular language or communication framework. You are also permitted to use third-party systems or frameworks within your project, with the understanding that your project will not receive credit for any functionality provided by third-party systems. For example, constructing a 20-machine Hadoop cluster and using it to compute an inverted index would technically fulfill the above project requirements (scalability, fault-tolerance, etc.), but your only real contribution would have been writing the (trivial) inverted index functions.
If you would prefer more guidance or are struggling to come up with a project idea, a suggested 'default' project is to build a simple peer-to-peer file transfer application. In such an application, the basic idea is that a peer that wishes to download a particular file can download it simultaneously from all peers that have at least part of the file. Thus, a group of peers can distribute the file more rapidly than if a single server had to send the file in its entirety to multiple clients that want to download the file. The figure below shows this basic design (in which peers exchange 'chunks' of the file with each other).
Note that even if you choose to pursue some version of this default project, there is still room for creativity! For example, questions that you might consider in the design of your system include (a) how do peers organize the connections between them, (b) how do peers locate a file that they wish to download, or (c) how do peers decide what chunks to download (and who to download from). The various P2P file sharing services we discuss in class may give you some ideas for general approaches.
As in the previous project, you will be provisioned with a set of Amazon machines on which to test your systems. The infrastructure in this project is different in several key ways, however: (1) all machines are shared among all groups; (2) machines are no longer geographically in the same area, but rather are spread around the world; and (3) you will have access to a greater number of machines, and should be aiming to run your system on as many of them as possible. Some of these machines may not be as responsive as others, or may have slower network connections, etc. This is part of the challenge of operating across a world-wide network like the Internet!
Since these machines are shared, you do not have sudo
permissions. If you need specific software installed on the machines, let me know and I can probably install it for you. Also, please be good citizens whenever possible! While some degree of interference is inevitable, you should not try to max out all the machines transferring data at full speed for long periods at a time. Such activity, especially for prolonged periods, will make your classmates unhappy.
Regardless of whether you choose your own project or opt for the suggested project, it is important that you actively manage the complexity of your system, especially at first. It is much better to start with a simple design, implement it, then add features later, rather than starting with an overly complex design and never getting a working prototype! Ideally, you should start by planning a base 'core' of the system that you are sure you can implement, then a set of extensions that you can add once the base system is running.
Be thoughtful about what metric(s) you are optimizing for when you build your system. For example, in the context of the file transfer application, are you trying to minimize transfer time, aggregate bandwidth, or something else? It will be important to discuss these decisions in your paper.
Finally, remember that system design is all about tradeoffs (e.g., performance vs. fault tolerance, complexity vs. scalability), and you are almost certainly going to need to make compromises. The key point is being conscious of what these compromises are and justifying the choices that you make.
One of the challenges of this project is managing and running a distributed application without an off-the-shelf control infrastructure (such as the one built-in to Hadoop). Don't try to run your system on 20 machines by opening up 20 terminal windows and launching processes one at a time! Instead, it is strongly recommended that you automate the process of deploying and running your application through scripts whenever possible. Using scripts written in some scripting language (Python, Bash, Perl, etc.) will save you lots of time trying to run your system. For instance, rather than SSHing to 20 machines and issuing the same command one-by-one, just write a script that automatically issues the command over SSH to all the machines you're trying to run on! Consider automating any task that you find yourself repeatedly performing and keep the principle of DRY in mind (Don't Repeat Yourself).
Initially, you will probably be better served by just running on a few machines during development, and then you can scale up to more machines once your system is running. You are welcome to use hopper
while implementing your system. For instance, you might choose to develop your application on hopper
and run scripts from there to deploy and run your system on the Amazon servers.
In addition to designing and building your system, you will (1) write a paper detailing your system, (2) present your project to the class, and (3) give a final demo to me showing your system in action.
You will need to submit a project proposal as well as a series of intermediate checkpoints leading to the final due date, as detailed below.
Your project will be graded on following the assignment specification, your program's functionality, design, and style, and the quality of your presentation, final demo, and final paper (including your experiments). You should consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions on what constitutes good program design and/or style that are not covered by the guide.
Note that the quality of your presentation, final demo, and paper are of particular importance to this project, as they are the primary means by which I will understand what you have accomplished. Just like in a real systems research project, your code is in many ways less important than the system's design, evaluation, and your presentation of the system. As such, it is essential that you do not neglect the non-implementation aspects of your project.