Building a kd-tree

In this assignment the goal is to build and visualize a kd-tree for a set of points in the plane. To manage complexity we'll split it in three parts:

Part 1: build the kd-tree.
Part 2: render/visualize the kd-tree.
Part 3: make it look like a Mondrian painting.

Starting code is available here.

Part 1: Building the kd-tree

First you will need to define a data structure to encode a kd-tree such as below --- feel free to refine as needed.

typedef struct _treeNode treeNode;

struct _treeNode {
     point2D p; /* If this is a leaf node,  p represents the point stored in this leaf. 
                  If this is not a leaf node,  p represents the horizontal or vertical line
                  stored in this node. For a vertical line, p.y is
                  ignored. For a horizontal line, p.x is ignored
                */
     char type; / * this can be 'h' (horizontal) or 'v' (vertical), or 'l' (leaf)
                    depending whether the node splits with a horizontal line or  vertical line.
                    Technically this should be an enum.
                */
     treeNode  *left, *right; /* left/below and right/above children. */
}

typedef struct _kdtree{
   treeNode* root; 

   int count; //number of nodes  in the tree

   int height; //height of tree
} kdtree;

You'll need to write the basic primitives for operating on a treeNode and on a kdtree, such as creating a node and creating an empty tree, printing a node, and printing a tree.

For example, include a function that prints some basic info about the kd-tree, such as number of nodes, and height. Call this function in the main functin so that we can see its output.

void kdtree_print(kdtree* t);

The main function that you will write for Part 1 is building a kd-tree from a set of points. You will write a function to build and return a kd-tree as follows:

/* Build a kd-tree for the set of n points, where each leaf cell
   contains  1 point. 
   Return a pointer to the root.
*/
treeNode*  kdtree_build(point2D* points, int n)

The function takes as input the set of points and returns the kd-tree.

To simplify dealing with coincident points, the first step in this function should be to call a helper function that removes coincident points.
```
int ndp; //number of dictinct points //this should be made a global same as n

ndp = remove_coincident_points(points,n); //ndp is smaller or equal to n
```
This helper function should remove the coincident points in points and return the number of dictinct points. One way it can do this is by sorting points and then in a subsequent pass shifting things left when equal. Basically the array points will stay the same, just that the ndp elements of it that are distinct will appear at the beginning.
Sort points by x-coord and by y-coord using system qsort. To do this first allocate two new arrays:
```
point2D *points-by-x, *points-by-y; 
//allocate them of size  ndp, copy data from points then sort them 
```
You need to use system qsort for this by defining appropriate comparison functions.
Call a helper function
```
kdtree_build_rec(point2D* points-by-x, point2D* points-by-y, int ndp, ...)
 
```
This helper function should build the kd-tree recursively. It should probably take the depth of the current node as a parameter and use it to decide whether to split vertically or horizontally.
The main challenge in this function will be to catch all the cases that can happen and make sure the recursion stops.
In case of an even number of points, the median should be either a point in between the two medians; or the smaller of the two medians --- in this case make sure you include the point on the line in the tree to the left or below, in order to avoid infinite recursion. Stop the recursion when the node contains 1 point (and perhaps earlier if necessary).
Finding the median: The median is the value in the middle index of the sorted array (sorted by x or by y, depending on the type of node). All points that are smaller or equal (in x or y) go on one side and the rest on the other. Because points can have same coordinates, it can be that more than half points go on the left side ---- you'll need to count and see precisely how many.
It's possible that all points go on one side. For e.g. consider the points
```
(2,6), (3,6), (3,5)
```
examined in the x-coordinate. Middle point is (3,6). But the third's point x-value is also 3, so it will go on the left side. Thus this passes the entire array to the next level. Then we examine them in the y-coordinate:
```
(3,5), (3,6), (2,6)
```
Middle point is (3,6). But the third point has same y-coord as the median, which means it will also go on the left side. Thus this passes entire array to next level again, i.e. infinite recursion. These points are not coincident but are collinear in just the wrong way to cause infinite recursion (example thanks to Rob).
You'll need to find a way to deal with this.
Splitting points-by-x and points-by-y and passing them through the recursion: the easiest way is to first do a pass and count precisely how many points go left and how many right, then allocate the new arrays and then do another pass and copy the points. Do not try to optimize passes, first make your code as simple and make it WORK. Leave optimizing for later.
Freeing up memory: don't forget to free the arrays that you are done with (C has no garbage collection!).

Testing: It goes without saying that you need to throroughly test your code. Testing is a crucial step in the code development cycle. The goal of testing is to find bugs. Try to break your code. Once you find a bug, try to reproduce it on the smallest possible input ---- it's no fun debugging on an input of half a million points.

To test your kd-tree, run it on sets of random points with values of n ranging from 1,2,3,4,5,...to 1000000. For each value of n press the space bar to get a different set of random points. For small values of n you'll want to start by printing the entire tree. Once your code works for small n, you'll probably want to switch to just printing the info of the kd-tree (number of nodes and height). Write a few different functions for initialization (in addition to initialize_points_random()), for example

//initializes array points with n points on a horizontal line 
void initialize_points_case1() {
..
}

//initializes array points with n points on a vertical line 
void initialize_points_case2() {
..
}

//initializes array points with 3 points as in the example above
//that may trigger infinite recursion
void initialize_points_case3() {
   n=3;
   ..
}

Part 2: Rendering the kd-tree

Write a function that renders the kd-tree in OpenGL. Use the code for the previous assignments. The OpenGL part is pretty easy --- basically

//for each node in the tree in some order {

      glBegin(gl_LINE);
      //identify the endpoints p1 and p2 of the line segment that you
      //need to draw
      glVertex2f(p1);
      glVertex2f(p2); 
      glEnd(gl_LINE);
}

The harder part in the rendering is identifying the endpoints of the line segment for that node. Note that the line x=x1through the root is infinite in the y-direction. The lines in the nodes left and right of the root are infinite on one side, and bounded by x=x1 on the other side. And so on. The region corresponding to a node (and thus the endpoints of the segments that will split it) can be computed based on the ancestors of the node in the tree.

The input points are generated in the range [0,WINDOWSIZE] x [0, WINDOWSIZE]. Thus a value of infinity in x or y direction should be set to WINDOWSIZE.

Needless to say, develop your code gracefully. Start by drawing infinite lines through all nodes, then refine it to compute the proper segment.

Part 3:

Comments

Make sure that your code is C (C99 standard) and not C+, for e.g. you shoudl NOT have

for (int i=0; i< ...);

As handy as it is to write loops like this, it is not standard C style. If in doubt, compile your code on dover. If your code compiles, you are all set. If not, fix it!

Enjoy!

Last modified: Tue Mar 25 11:45:35 EDT 2014