Project 2 - Online Bookstore

The goal of this project is twofold: one, you will gain experience designing a multi-tier distributed system, and two, you will work with remote procedure calls as a common framework for distributed programming. Additionally, you will experiment with your system to see how query loads affect performance.

This project should be done in teams of two (or three, with prior permission). All team members are expected to work on all parts of the project.

System Specification

You have been tasked to design Nile.com - the world's smallest online bookstore. Since Nile.com plans to one day supplant Amazon, they would like to use sound design principles in building their online store to allow for future growth.

The store will employ a two tier design - a front-end and a back-end. The front-end tier will accept user requests and perform initial processing. The backend consists of two components: a catalog server and an order server. The catalog server maintains information on current inventory, while the order server is responsible for keeping the catalog appropriately updated.

A pictorial representation of the system is shown in the figure below.

architecture

Within the system, inventory is represented as a series of book entries, where each entry consists of (1) an item number, (2) the book's name, (3) the book's topic, and (4) the number of copies of the book in stock. The various components of the system interact with inventory via a set of operations, detailed below.

Server Operations

The front end server supports three operations (invoked by clients):

search(topic) – return all item numbers belonging to a specified topic (i.e., category)
lookup(item_number) – look up inventory details (name, topic, and inventory count) for a particular item
buy(item_number) – purchase one copy of a particular item, removing it from future inventory

The search and lookup operations trigger queries to the catalog server, while the buy operation triggers a request to the order server.

The catalog server supports two operations:

query(arg) – lookup details of current inventory. Two types of queries are supported: query-by-subject and query-by-item. In the first case, a topic is specified and the server returns all matching entries. In the second case, an item number is specified and all relevant details are returned.
update(item_number, qty) - updates the current inventory of a given item by the specified quantity (which may be positive or negative).

The order server supports a single operation: buy(item_number). Upon receiving a buy request, the order server must first verify that the item is in stock by querying the catalog server and then decrement the number of items in stock by one. The buy request can fail if the item is out of stock.

Data Requirements

Initial inventory stock may be determined arbitrarily. New stock should arrive automatically and periodically (e.g., every 30 seconds a new copy of each book is added; the details of this are up to you). The actual set of books (and topics) contained in the catalog is also up to you. While your design and data structures should be flexible enough to handle any catalog, the objective here is not to actually manage a large dataset, so a small number of books (e.g., 10 books spread over 3 topics) is perfectly sufficient. In a real service, the catalog would likely be stored in a full-fledged database server or similar. Here, you do not need to persist the catalog to disk at all; keeping the catalog data in-memory at the catalog server is fine.

Technical Requirements

All servers should be written in Java using XML-RPC for remote communication. In addition to the three server programs, you will write two clients - one in Java and one in Python. Both clients should communicate with the same front-end server program using XML-RPC.

No GUIs are required for any servers or clients. Simple command line interfaces are fine (e.g., for issuing requests via a client). The exact format of the client interface is up to you, but should include typical information and operate in an intuitive way (e.g., showing the results of a query in a nicely formatted manner, printing whether purchases requests succeeded or failed, etc).

The service must support multiple concurrent queries; i.e., mutliple clients should be able to have their queries serviced without having to wait for other queries to complete first. In contrast, purchases do not need to support concurrency; while multiple clients may of course issue purchase requests simultaneously, the order server does not need to actually perform them concurrently.

Be aware of thread synchronization issues to avoid inconsistency or errors in your system. For instance, two concurrent purchase requests should not both be able to buy a single remaining copy of a book. You may find Java's synchronization primitives helpful here (described more below).

All programs except the catalog server should accept a hostname (or two) as a command-line argument of the machine to connect to. For a client, this would be the hostname of the front-end server. For the front-end server, this would be the hostname of the order server and the hostname of the catalog server. For the order server, this would be the hostname of the catalog server.

Implementation Advice

Example XML-RPC Code

To help you get started with XML-RPC, an example server and clients are provided below:

Server.java – an XML-RPC server in Java
Client.java – a Java client that talks to the above server
client.py – a Python client that talks to the above server

As a place to start, try running the server and clients on the class server and make sure you understand the basic structure of the code.

Compiling against XML-RPC Classes

You will need to compile against the Apache XML-RPC Java classes, which have been installed on the class server. To compile with them, you need to include them in your "class path", which is basically the set of directories that Java uses to search for classes. The best way to link these classes to your classpath is by adding the following line to your .bashrc file on the class server:

export CLASSPATH="/usr/share/java/xmlrpc-client.jar:/usr/share/java/xmlrpc-server.jar:
  /usr/share/java/xmlrpc-common.jar:/usr/share/java/ws-commons-util.jar:
  /usr/share/java/apache-commons-logging.jar:.:$CLASSPATH"

Once you've done this, then every new shell process will have the classpath automatically set up, and you should be able to compile and run code (via javac and java) using the XML-RPC classes.

In Python, no special action is required beyond importing xmlrpclib (version 2) or xmlrpc.client (version 3). You can use either Python 2 or Python 3; python defaults to v2 on the class server, while you can explicitly call v3 via python36.

If you want to develop in Java on your local machine, you can also download the XML-RPC classes. You will need to add the jarfiles to your local or IDE classpath in order to use them (but note that the paths will be different from specified above).

Documentation on the XML-RPC classes (e.g. Javadoc) is available on the Apache XML-RPC site.

Stateless Servers

XML-RPC is an example of a stateless protocol, which means that no information (or state) is automatically maintained across multiple communication calls. In other words, every XML-RPC request completely stands alone and is not intrinsically related to any other request. Other examples of stateless protocols include HTTP and IP, while TCP (in contrast) is an example of a stateful protocol (since TCP messages are inherently part of a stream of messages between two hosts, and this information must be tracked across messages as part of the protocol).

The main practical implication of the statelessness of XML-RPC is that the server will create a new server object (e.g., an instance of the Server class in the example code) for every new RPC request. This approach simplifies XML-RPC itself and makes it easy to handle lots of simultaneous requests, but may complicate your program design a bit to accommodate this. You can see the impact of statelessness yourself by adding an instance variable to the Server class, modifying it and printing it out within the RPC, then making mulitple requests to the server.

Concurrency and Synchronization

Your servers will need to be careful about protecting data that may be accessed by multiple requests concurrenctly. In Java, a standard way to perform synchronization is via the synchronized keyword, which allows you to (among other things) mark specific methods as only safe for execution by one thread at a time. Doing so ensures that the method will not be executed concurrently, which is good for safety but potentially bad for performance. Thus, the essential challenge of synchronization is to provide safety when necessary but still allow for enough parallelism to achieve good performance.

You may find Oracle's tutorial on synchronization in Java useful if you have not done much concurrent programming previously.

Testing and Evaluation

For initial testing, you can run all components of the system on the same physical machine. For later testing, you will be provided a set of 4 separate machines for running each component of the system separately - details on these machines will be provided via email. You will need to run a set of performance experiments and provide results in your writeup (described below).

Writeup

Once you have implemented your system, you should write a short paper (2-4 pages) that describes your system. The general framework of this paper should be similar to the outline described in Project 1. As part of your evaluation, you should include results of the following experiments:

Compute the average response time per client search request by measuring the response time seen by a client for 500 sequential requests.
Now compute the average response time per buy request for 500 sequential requests.
Rerun experiments 1) and 2) with multiple clients concurrently making requests to the system. Does your average response time change as the number of concurrent requests changes?

The exact setup of how you run your experiments is up to you, but your experimental design should be sufficient for answering these questions, and your setup should be described in your writeup in enough detail that someone else would be able to replicate your experiments. Include graphs where appropriate to demonstrate your results.

Submission and Evaluation

Submit your assignment to Blackboard as a tarball, e.g.:

tar czvf proj2.tar.gz your-project-files

which will create proj2.tar.gz from the files in your-project-files). Your project files should include (1) your source files, (2) a Makefile that allows me to build your servers by running make, and (3) a copy of sample output generated by running your client.

Separately (by the writeup due date, which is 48 hours after the code due date), you should upload your writeup to Blackboard as a PDF.

Finally, if working in a group, you must *individually* email me a group report at the end of the project summarizing your contributions and the contributions of your partner(s) to the project. The goal of this policy is to promote an equitable distribution of work within the group. Your report is not shared with your group, but in the event of clearly uneven contributions, I reserve the right to adjust individual grades up or down from the group grade. Reports need not be lengthy, and in many cases may be as simple as "We worked on the entirety of the project together in front of one machine" or similar. Submit your individual group report to me by email by the writeup deadline. You should only submit a single group report covering both coding and writing.

Your project will be graded on following the assignment specification, your program's design and style, and the quality of your project writeup. You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions on what constitutes good program design and/or style that are not covered by the guide.