Final Projects

Timeline

Tuesday 11/15: decide on a project
Thursday 11/17: 5-minute presentation
Thursday 12/8 (last class): preliminary project demo
December 18th (final exam date in polaris), 2-5pm: final demo and paper.

5-minute presentation

Teamwork

You can work alone, or with a partner. Naturally the expectation is that the work of a team scales linearly with the number of people in the team (this means: 2 people in a team leads to double the number of hours spent on the project).

I am willing to allow teams of 3 people, though based on my experience it is harder to coordinate --- you will have to convince me that you have a plan.

Paper

You will use LaTex. If this is the first time you use LaTex, check the www for more info. Feel free to use the Latex templates here (at http://www.bowdoin.edu/~ltoma/teaching/LatexTemplate/ ); they were created for previous classes, so disregard the details. To install Latex on your computer, you could try MacTex: http://tug.org/mactex/

There are many LaTex guides online, if you should need anything beyond the template. Here is a suggested outline for the paper:

[Section 1] Introduction and background: A short description of the problem you solve, how it is defined, why it is important and how it is used. General level and relatively brief, at least for this report.
[Related work]: In a research paper you would have a section in which you would discuss the related work, and how your proposed work addresses some of the questions left open in the previous work. This probably will not be the case for this project report. UPDATE: SINCE WE DID NOT DISCUSS PAPERS IN CLASS, YOUR PAPER NOW HAS TO INCLUDE A SECTION ON PREVIOUS WORK.
[Section 2] The algorithm: Here you describe, at a general level, the approach.
[Section 3] Implementation: Describe in detail your implementation: its main steps, and how they are implemented.
[Section 4] Experiments: what sort of experiments you did, what data, what platform.
[Section 6] Conclusion and future work: Describe what you learnt in this project, what went well and what went wrong, what you wish you had done differently, any related stories that you want to share, any questions you ran into, any insights (and of course why you liked this class).

Submitting your work

Make a folder called project-xxxx where xxx indicates the topic of the project (e.g. project-slr, or project-watershed, or project-lidar2grid, etc) in your svn folder on microwave, and update it with your work. Make sure to include the paper.

Below is a list of project options:

Modeling sea-level rise (SLR) flooding on a grid terrain

The basic problem is the following: Given a grid terrain (part of which is the sea) and a sea-level rise (e.g. 3ft), model flooding of the terrain as the sea rises to the given level.

Unless you watch Fox News, you probably believe that climate change is happening. As temperatures continue to rise, ice will melt and the sea level will rise. Scientists predict significant sea-level rises (between 3 and 5 feet) in the next 100 years. Below is one of the recent pictures in the news (NYC flooded):

In this project you will produce a similar flooding simulation and you can show it to your family and friends to raise awareness. Even more, if your code is efficient on large data and if you implement the extension that I suggest below, your code will be useful to local agencies that are looking at impacts of flooding.

Performance: An important goal for this project is performance. The DEM folder on dover:/mnt/research/gis/DATA/DEM contains a grid for Lincoln county at 2m resolution, about 900 million points. We also have a grid for Knox county at 2m resolution (CHECK SIZE). Both these counties have been generated from high-resolution Lidar data using ArcGIS. Generally speaking we have .5 TB of 1m resolution Lidar data for Maine and we could generate other counties; also we could generate grids covering more than one county. Ideally we would have a grid covering the whole Maine coast, but noone dares work with such large datasets simply because there is no software that can handle it (the exception is LAStools which can handle large data, and can generate very large grids from Lidar data, but the modules that do this are not open sources). All in all, this is an opportunity to develop an efficient algorithm, customize it and parallelize it. The problem of flooding is not immediately parellizable (like the viewshed problem) however some simple data partition strategies can be explored and all in all coming up with an approach will be a great problem.

SLR+BFE: An additional issue is to consider the impact of storm waves in addition to SLR flooding. The flooding due to waves is given as a BFE grid: this is a grid of the same size as the elevation grid, its value at point (i,j) is the height of waves at point (i,j). The BFE grid is zero at sea and inland, and has non-zero values only along the coast (see picture below). The BFE grid gives the extent of the current flooding (note: do we know anything about how it is calculated? is it based on historical flood data?), without taking SLR into account. A picture of the BFE grid for Southport is below (you can find it in the folder with SLR papers and data).

The goal is to "add" BFE to SLR and model flooding with both---this will give the flood zones in the future, when in addition to storm waves there will be sea-level rise.

If you want to work on this problem, I see several possibilities:

Compute SLR flooding.
Parallelize SLR flooding
Compute SLR + BFE flooding.
Parallelize SLR+BFE flooding.

Obviously the forth one is the most ambitious, but each one of them will be a good project. Even if you aim for the the fourth one, its a good idea to start from the first one.

Test data: the island Southport in Maine, provided by Eileen Johnson. Data is here.

Relevant links: SLR papers

Computing the watershed hierarchy on a grid terrain

The problem is the following: Given a flow direction (FD) grid, compute a (recursive) watershed partition of the terrain.

It is important that you work with a complete FD grid, that is, one which routes flow on flat areas and routes flow out of sinks. The process of finding the sinks in the terrains and simulating flooding is tedious to implement, so you can skip this and use FD grids generated by GRASS GIS. If you want to run GRASS GIS, it is available on dover; I generated a couple of test FD grids and uploaded them here.

Given the FD grid, the process of computing a watershed partition is the following:

Find the mouths of the rivers and walk upstream, finding the 4 biggest tributaries. Call these 2,4,6 and 8.
Compute the watersheds of these tributaries and call them W2, W4, W6 and W8.
Compute the watersheds in between these, and call them W1, W3, W5, W7 and W9.

Doing this process once will give a partition into 9 watersheds. The output shoudl be a grid the same size as the terrain, where each point in the grid is labeled with a number 1 through 9, corresponding to the watershed it is in.

The basic function is this project will be a function to determine the watershed of an arbitrary point p. As a reminder, the watershed of a point p is all the grid points that flow to p.

Computing these 9 watersheds will be a good project.

As a refinement, you can repeat the process inside each watershe. This way you can find 9 sub-watersheds inside each of the 9 watersheds. For example the sub-watersheds of watershed 1 will be numbered 11,12,...19. The sub-watersheds of watershed 2 will be numbered 21, 22, 23,...,29; and so on.

Test data: You will find some test FD grids here. The DEMs are stored in teh standard location on dover dover:/mnt/research/gis/DATA/DEM/

Relevant links: watershed papers

Computing the total viewshed in parallel and an experimental evaluation

Total viewshed data and papers

The problem is the following: Given an elevation grid, compute the total viewshed grid. This is a grid where the value of point (i,j) is the size (that is, number of visible points) of the viewshed of (i,j).

Thus computing the total viewshed entails computing a viewshed for each single point in the terrain as a viewpoint. This process is computationally intensive and for example on Kaweah dataset (1 million points) it took 42 hours with one core. Here is a picture of the total viewshed for Kaweah:

The goal in this project is to parallelize this computation and perform a detailed and careful inverstigation of its performance. You will run experiments with various number of threads and measure the speedup; and you will attempt to explain why the speedup flattens out by finding a way to measure the running times of each thread, and thus the overall load balancing. You will experiment with ways to improve the load balancing among threads. All projects need to be accompanied by reports, but for this project in particular, since the bulk of the work is in the experimental evaluation, I expect a report that could be submitted to a conference.

Test data: See DEMs in dover:/mnt/research/gis/DATA/DEM/.

Parallel total viewshed:

Calculating the inherent visual structure of a landscape (total viewshed) using high-throughput computing (Llobera et al, 2003)
Simultaneous computation of total viewshed on large high resolution grids (Tabik, Zapata, Romero; International Journal of Geographical Information Science; 2012)
Efficient data structure and highly scalable algorithm for total viewshed computation (Tabik, Cervilla, Zapata, Romero; IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing; 2014)

Lidar2grid: computing a grid from a lidar dataset

The problem is the following: given a Lidar dataset, and the spacing between the cells of a grid, compute a grid that represents the same terrain. I imagine this will be similar to lasgrid in LAStools. Here's the description of lasgrid in LAStools:

 lasgrid : is a tool that reads LIDAR from LAS/LAZ/ASCII and
grids them onto a raster. The most important parameter ‘-step n’
specifies the n x n area that of LiDAR points that are gridded on one
raster cell (or pixel). The output is either in BIL, ASC, IMG, TIF,
PNG, JPG, XYZ, FLT, or DTM format. The tool can raster the
‘-elevation’ or the ‘-intensity’ of each point and stores the
‘-lowest’ or the ‘-highest’, the ‘-average’, or the standard deviation
‘-stddev’. Other gridding options are ‘-scan_angle_abs’, ‘-counter’,
‘-counter_16bit’, ‘-counter_32bit’, ‘-user_data’, ‘-point_source’, and
others. For more details and other options not mentioned here see the
README file.

Here's the more detailed README for lasgrid.

lasgrid has many options for the output; your output should be and ASC grid, same format as the one you've used so far. Assume you start with a Lidar dataset in ASC form (as generated by las2txt). Allow the user to raster the lowest, or highest, or average point (and perhaps other). By default the grid shoudl cover the entire extent of the bounding box of the lidar dataset. Experiment with various step sizes and especially when the resolution of the grid gets close to the resolution of the Lidar data. If there are grid cells that dont contain any lidar points, then the resolution requested for the output grid is too high. You will investigate what's the highest resolution grid for a particular dataset.

A nice challenge here is to make your algorithm work on very large Lidar data where neither the Lidar data nor the grid actually fit in memory.

Test data: Check out Lidar data in dover:/mnt/research/gis/DATA/LIDAR/. There's also Lidar data that comes with LAStools.

Papers/links:

GRASS GIS r.in.lidar

Lidar data simplification

The problem is the following: GIven a Lidar dataset and an error epsilon, use the greedy insertion algorithm dicussed in class to simplify the set within the specified error threshold.

For example, your code could take on the command line the name of the lidar dataset to be simplified, the desired error threshold, and the name of the output file where you;ll write the esulting TIN. Your code should simplify the lidar data, time it, print a summary, and then render the TIN. The function to simplify should be separated from the render and timed. The summary should include how many points are left in the TIN (both in absolute value and percentage of the number of points in the input grid), and total time for simplification. The time for simplification should not include the time to read the lidar data into memory, or to write the TIN to a file. In summary, one should expect to see the following on the screen when running your program:

./simplify lidar.txt  10 lidar.10.tin
reading lidar.txt in memory...done.
total xxx seconds.
---------
starting simplification
n=184552.
...
done. n'=2019 (1.09% of 184552)
total time xx seconds 
---------
writing TIN to file lidar.10.tin

The error epsilon on the command line should be interpreted as an absolute value. Example: the command above produces a TIN that is within a distance of 10 (units) from the lidar data. The units here are the same as the units used for height in lidar.txt.

When run with epsilon=0, your program should eliminate all the flat areas. If there is no flat area then running with epsilon=0 will not eliminate any points.

Test data: Check out Lidaar data in dover:/mnt/research/gis/DATA/LIDAR/. There's also Lidar data that comes with LAStools.

Relevant links:

The classic: Fast polygonal approximations of terrain and height fields (Garland and Heckbert 1995)
Garland's Terra, Scape, QSlim
Jonathan Shewchuck's website
Streaming computation of Delaunay triangulations (SIGGRAPH 2006)
Generating Raster DEM from Mass Points via TIN Streaming (Isenburg, Liu, Shewchuck , Snoeyink, Thirion; GIScience 2006)

Classification of Lidar point clouds

The problem is the following: given an unclassified Lidar point cloud, classify it: that is, find the ground, the vegetation and the buildings. I know little about how to do this, but a quick look at LAStools made me think that the process is to first find the ground (lasground), then find the height of points above ground (lasheight), then find the vegetation and the buildings.

Oviously doing all three will be too ambitious for a term project. Just pick a part that looks interesting to you. For example, you could work on finding the ground. Or, you could run lasground to find the ground, and focus on finding the vegetation, or the buildings. This is worth exploring if you are interested in vision, because finding roofs entails computing some sort of planarity estimate for a point and its neighborhood. If you had an image instead of a grid, the starting point would be to compute the Sobel operator that estimates first derivatives an thus identifies sharp edges. Here you have a Lidar dataset that stores actual heights not colors, so the approach is different, but the spirit the same: identify how the terrain looks around a point in order to see if it's part of a planar roof. This assumes that roofs are planar, but hey you have to start somewhere and make some assumptions. Roofs that are not planar will need to be identified in a different way. If you want to chose this project, we have some datasets from Eileen where you can test it, check dover:/mnt/research/gis/DATA/LIDAR/Boothbay_Harbor.

Test data: Check out Lidar data in dover:/mnt/research/gis/DATA/LIDAR/. You will see a folder Lidar_for_Northeast that contains Lidar data for Maine.

Links:

LAStools lasground README
Lidar classification papers
More paper links:
Report: ISPRS comparison of filters (2003)
GRASS GIS GRASS GIS for the distinction of vegetation from buildings using Lidar data (Sanchez, Bovelli, 2008) | ) v.lidar.edgedetection | v.lidar.growing
Automatic detection of residential buildings using LIDAR data and multispectral imagery (Awrangjeb, Revanbaksh, Fraser, ISPRS, 2010)