Coding Design & Style Guide

For coding assignments in this course, your program will be evaluated not simply on correctness (i.e., whether it follows the assignment specification), but also on good design and coding style. This document details specific design and style issues that I will look for when reviewing your programs. As many aspects of design are highly situational, this is by no means an exhaustive list, but you should strive to follow these guidelines as well as exercising your own judgment when designing your code.

This guide is targeted towards C and C++ code, but most of these principles apply to working in any programming language.

Documentation

Good code should be largely self-documenting - i.e., the code itself (such as variable and function names) should generally make it clear what you are doing. In particular, more documentation is not necessarily better, and it is better to have clear code that doesn't need any documentation versus unclear code with extensive documentation.

Comments should not describe what the code does, but why. Novice coders (or those just starting in a new programming language) often make the mistake of documenting what the code is doing (e.g., "dereferences the pointer"). Such comments are not necessary; as a general rule of thumb, you should assume that the reader is already comfortable with the language. Instead, your comments should provide detail that is not already evident from the code itself.

There are several parts of your code that do generally deserve comments:

Function headers: Every function should be prefaced with a comment summarizing the purpose of the function, the function parameters (if not self-evident), and especially the function return value. Other information that is often appropriate includes error conditions, any assumptions made by the function, and any changes made to other parts of the program (e.g., modifying an external data structure) that are not evident from the rest of the description.
Long code blocks: Lengthy blocks of code within a function can often benefit from an inline comment summarizing the purpose of the code block. However, most functions should not be excessively long (e.g., ~20 lines or less), which should minimize the need for inline documentation.
'Clever' code: Be careful of writing 'clever' or 'tricky' code. In most cases it is better to rewrite such code in a straightforward way, even if it is marginally less efficient as a result. However, if there's no clear way to make a bit of code self-evident, then a comment should be added to explain.

Whitespace

Proper use of whitespace is essential to the readability of your code. In particular:

Every new block of code (function, conditional, loop, etc) should be indented one additional "level".
The size of each level should be consistent. For example, you are likely to run into trouble if you use tabs to indent in some parts of the program while using regular spaces to indent in other parts. You can use whichever convention you prefer, but you must be consistent.
Blank lines should be judiciously used to separate blocks of code. Functions should always be separated by blank lines, as should larger blocks of code within functions.

Line Length

One easy way to make your code hard to read is to write excessively long lines of code. For this course, you should not have any lines of code longer than 100 characters (whitespace included). Note that 80 characters is a fairly typical 'real-world' limit, so this requirement is not as strict as is often used. Lines exceeding the maximum length should be broken up across multiple lines. To quickly check the maximum line length of file.c, you can run the following command on Linux:

wc -L file.c

Variable Names

Variable names should clearly describe what the variable contains. Local variables with obvious purposes (such as a loop counter) may be named with a single letter (e.g., i); other variables should not. Generic variable names (e.g., foo, var, asdf) should never be used. Variable names should be descriptive but concise; e.g., a name like sumOfAllArrayValues could probably be better named simply arraySum. Variable names should generally be nouns (e.g., arraySum) while function names should generally be verbs (e.g., calcArraySum).

Variable names with multiple words should be formatted consistently. For example, arraySum or array_sum are both fine, but using arraySum in one place and array_sum in another is not. Pick your convention and stick to it -- standard C convention generally dictates using underscores to separate words, but this convention is not required as long as you are consistent.

Magic Numbers

"Magic numbers" are numbers in your code that have a meaning beyond their own values. For example, in the line num_days = num_years * 365, the number 365 has a significance beyond simply being the number 365 -- it's the number of days in a year. Magic numbers like these should be named using #define at the top of the file, as follows:

#define DAYS_IN_YEAR 365
...
num_days = num_years * DAYS_IN_YEAR;

Magic values should never be used directly in your code. Note, however, that not every number actually has a meaning beyond its own value -- e.g., the values 0 and 1 usually do not need to be named.

Consistency and Teams

One of the most important aspects of coding style is consistency -- not only in the areas covered by this guide, but in other areas as well (e.g., whether curly braces go on the same line as their associated keyword or on the next line). You are allowed to make your own style choices in such cases, but you should always be consistent. Unexpected style changes in a program substantially detract from readability even if the individual style choices in question are reasonable.

The issue of consistency is particularly important when working in a team. Since your team members may have their own personal preferences and conventions (which may differ from yours), it is critical to agree in advance which conventions you will use. A great way to waste time and annoy your partners at the same time is to write code using multiple different conventions, then go back later and change all your partners' code to match your own code's style. Avoid this problem and decide on your style conventions in advance!

Dead Code

"Dead code" is code that is not actually active in your program. Such code might include old debugging statements that you commented out (e.g., printf), or an old function you wrote that is not actually called anywhere from within your program. While some dead code is an inevitable product of development, you should remove all dead code in your final program. A submitted program should never contain any dead code.

Modularity

To the extent possible, you should strive to make your code modular. Writing appropriately modular code includes the following:

Follow the DRY principle ("Don't Repeat Yourself"). You should not have identical or nearly identical chunks of code in your program - instead, move these blocks of code into separate functions and call those. Use of copy-paste is almost guaranteed to indicate a violation of this principle.
Functions should be compact and perform well-defined tasks. Long functions, or those for which it is hard to succinctly describe what the function does, should be broken up into multiple smaller functions. While there is no formal rule on what constitutes a "long" function, most functions you write should be no longer than 30 lines of actual code. A function with more than 50 lines of actual code (including main) is almost certainly too long and should be modularized.
While not strictly prohibited, use of global variables should be minimized. Variables should definitely not be made global unless it is necessary for the program.

Error Checking

When writing a program, we normally assume that all functions will complete successfully (e.g., the user provides a valid input, a file can be read successfully, and so forth). It is equally important, particularly with system-level code, to consider failure cases. For many types of errors, a program may have no reasonable option but to exit (e.g., if a call to malloc fails and the program needs the memory to proceed), but it is still better to recognize this failure and cleanly exit with an error message than to blindly proceed and crash sometime later (likely with an uninformative message). In general, any time you make a call that might fail -- even if you think it's extremely unlikely that it will fail -- your code should still check for the failure case and respond appropriately.

Optimization

Strive for clean design over optimizing the performance of every bit of code. In the vast majority of cases, you will spend far more time debugging a more complex design than worrying about (or waiting for) your code to actually run. Of course, you should not neglect program efficiency completely, particularly when considering your high-level program design (e.g., choosing a logarithmic-time data structure over one that is only linear-time). However, be careful not to complicate your code by fine-tuning small details in a way that compromises its readability. Choosing a faster data structure to use is generally a sound decision. Replacing a multiplication operation by a bit-shift to shave off a few processor cycles is generally not. The famed computer scientist Donald Knuth is quoted as saying "premature optimization is the root of all evil". Trust him.