Assignment 1: Finding the closest pair of points--two algorithms and their experimental evaluation

In this assignment you will write code to find the closest pair of a set of points in the plane using two methods, and you will perform an experimental evaluation of the two methods in practice.

First: the naive, quadratic algorithm discussed in class.
Second: For the second algorithm, you will not implement the optimal divide-and-conquer algorithm, but instead you will develop an approach based on a widely used heuristic which gives good results in practice under certain assumptions about the data. Namely, you'll use a grid to bucket the points into grid cells and then use the grid to speed the search of the closest pair (see last problem in closest pair exercises).

Experimental evaluation: Denote the number of points by n. Generate sets of increasingly larger size sizes, and time both methods separately, until the difference in running times is significant. For e.g. you could pick n=100,1000, 10000, .... Time each method separately, and record the running times in a table.

Comments

This first assignment is meant as a warmup to get everyone up to speed with C/C++. It is also an opportunity to learn what quadratic complexity means in practice, and to go from an idea (the grid approach) to a working algorithm.
Use C/C++, rather than Python. For one, the startup code is given in C/C++. Also, learning to code in C/C++ is a useful exercise that structures and changes the way you approach programming.
Generate the set of points randomly. Assume they have real coordinates. For consistency, assume the range of the coordinates is [0; 1,000,000].
The number of points, n, should be read as a command line argument. For example,
```
[ltoma@lobster] ./closest 100
```
means that it will generate and compute on n=100 points. Make your functions print out the time they take to run. That way you can run your code with increasingly larger values for n and record the results.
Implement the functions that compute the closest pair as two separate functions, one corresponding to the naive algorithm and one to the improved, gridded algorithm. For example,
```
//compute and print the closest pair in P using the naive algorithm 
void closest_naive( array of points P)


//compute and print the closest pair in P using a gridded approach
void closest_grid( array of points P)
```
Unless you use vectors, you'll also need to pass in the size of the array.
Make each function print the points that are closest so that it's easy to check that the two functions actually find the same pair.
For the gridded approach, use a grid of k-by-k cells. You'll set the value of k in order to make the algorithm efficient. Because the points are uniformly distributed, you can estimate for e.g. how many points do you expect to fall in each cell. Think of how you would go about finding the closest pair, and how the number of points per cell may influence it. Using this insight, what value of k do you pick?
The details of this approach are intentionally left open-ended so that you can develop and implement your own ideas.
Ideally, your code will work correctly for any value of k you choose. That is, it will work if you set k=1, and will work if you set k to such a large value that most of the cells will be empty; for example k=n will mean most cells will be empty since there are n² cells and n points.
The value of k is either specified by the user on the command line along with n, or is a global variable that you can set at the top of your file.

Pair programming

You are encouraged to find a partner and work as a team (make sure you read and follow the pair-programming policy on class website). But if you would rather work alone, or if you don't know anyone in the class yet --- that's totally fine, just work alone. There are pros to both.

Help

My office hours are:

Mondays 3-4pm
Wednesdays 3-4pm
Fridays 1-3pm

Come any time to discuss any issues you are encountering or simply to chat.

Slack: Use Slack as much as you can, ask questions, and answer questions.

Getting your code to work: Expect you will have to debug your code. Remember you are here to learn, so don't compare yourself with others. When your code finally works, you created that world and you feel like God (paraphrasing Linus Torvalds); but, along the way, you make mistakes and you feel stupid. That's the beauty of programming. That's normal!

What and how to turn in

You will receive the assignment on GitHub, but there will be no startup code. You are encouraged to do pair-programming, but feel free to work alone. Push your code into your github repository for this assignment. Provide a README where you describe briefly how your gridded approach works, how you chose the grid size, and include the running times.

If you are done with this assignment and you want an extra challenge, it will be cool to implement the divide-and-conquer algorithm discussed in class.

Enjoy!

Review: Arrays and vectors in c++

Feel free to use C, C++, or a combination of the two. You will likely need to work with arrays, so I created a simple test program that demos how you allocate arrays, which you can find here.

Arrays of integers, C style

  int *a; 
  
  /*  DON'T DO THIS: 
  
      int a[n]; 
  
      It's wrong and you're sure to get segfaults for larger values of
      n. YOU NEED TO ALLOCATE  dynamically using malloc() because you
      don't know n at compile time.
  */ 
  
  //allocate the space  dynamically
  a =(int*)malloc(n * sizeof(int)); 
  
  //put some data in it 
  for (i=0; i < n; i++) 
    a[i] = 1; 

  //compute something 
  sum=0; 
  for (i=0;i< n;i++)
    sum += a[i]; 

  //free the space 
  free(a);

Array of integers, C++ style


 //an array of n ints, C++ style 
  int *b; 
  
  /*  DON'T DO THIS: 
      
      int b[n]; 
      
      It's wrong and you're sure to get segfaults for larger values of
      n. YOU NEED TO ALLOCATE  dynamically using new because you
      don't know n at compile time.
  */ 
    

  //allocate the space  dynamically
  b = new int[n]; 

  //put some data in it 
  for (i=0; i < n; i++) 
    b[i] = 1; 

  //compute something 
  sum=0; 
  for (i=0;i < n;i++)
    sum += b[i]; 

  //free the space 
  delete [] b;

Array of vector< int >


 //an array of Vectors, C++ style 
  vector< int > *d; 

 /*  DON'T DO THIS: 
      
      vector< int > d[n]; 
      
      It's wrong and you're sure to get segfaults for larger values of
      n. YOU NEED TO ALLOCATE  dynamically using new because you
      don't know n at compile time.
  */ 

  //allocate the space  dynamically
  d = new vector< int > [n]; 
  //NOTE: we assume that c++ calls the constructor to create a new Vector at each d[i]
  
  //put some data in it 
  for (i=0; i< n; i++) 
    //d[i] is a Vector, so we push 1 into it 
    d[i].push_back(1); 

  //compute something 
  sum=0; 
  for (i=0;i< n;i++)
    sum += d[i][0]; 

  //free the space 
  delete [] d; 

  printf("test4: sum=%f\n", sum);

2D-array of vector< point >


//a structure for a point in 2D
typedef struct _point2d {
  double x, y; 
} point2d; 


//a 2D array of Vectors of points 
  vector< point2d > **grid; 
  
  /*  DON'T DO THIS: 

     vector< point2d > grid [n][n] 

     It's wrong and you're sure to get segfaults for larger values of
     n. YOU NEED TO ALLOCATE  dynamically using new because you
     don't know n at compile time.
  */ 

  //allocate first an array of vector*
  grid = new vector< point2d >* [n]; 
  for (i=0; i < n;i++) {
    //grid[i] is a vector*, that is, an array (of vectors); we allocate it 
    grid[i] = new vector< point2d > [n]; 
  }

  //put some data in it 
  for (i=0; i < n; i++) {
    for (j=0; j < n; j++) {
      //grid[i][j] is a Vector, so we push 1 into it 
      assert(grid[i][j].size()==0); 
      point2d p = {1.0, 1.0};
      grid[i][j].push_back(p); 
      assert(grid[i][j].size()==1); 
    }
  }
 
  //compute something 
  sum=0; 
  for (i=0; i < n; i++) {
    for (j=0; j < n; j++) {
      // grid[i] is a vector;grid[i][j][0] is the first point in that vector
      sum += grid[i][j][0].x;
    } 
  }
  //free the space 
  delete [] e;

Last modified: Wed Sep 8th, 17:03:04 EDT 2021