You can work alone, or with a partner. Naturally the expectation is that the work of a team scales linearly with the number of people in the team (this means: 2 people in a team leads to double the number of hours spent on the project).
I am willing to allow teams of 3 people, though based on my experience it is harder to coordinate --- you will have to convince me that you have a plan.
There are many LaTex guides online, if you should need anything beyond the template. Here is a suggested outline for the paper:
Below is a list of project options:
The basic problem is the following: Given a grid terrain (part of which is the sea) and a sea-level rise (e.g. 3ft), model flooding of the terrain as the sea rises to the given level.
Unless you watch Fox News, you probably believe that climate change is happening. As temperatures continue to rise, ice will melt and the sea level will rise. Scientists predict significant sea-level rises (between 3 and 5 feet) in the next 100 years. Below is one of the recent pictures in the news (NYC flooded):
In this project you will produce a similar flooding simulation and you can show it to your family and friends to raise awareness. Even more, if your code is efficient on large data and if you implement the extension that I suggest below, your code will be useful to local agencies that are looking at impacts of flooding.
Performance: An important goal for this project is performance. The DEM folder on dover:/mnt/research/gis/DATA/DEM contains a grid for Lincoln county at 2m resolution, about 900 million points. We also have a grid for Knox county at 2m resolution (CHECK SIZE). Both these counties have been generated from high-resolution Lidar data using ArcGIS. Generally speaking we have .5 TB of 1m resolution Lidar data for Maine and we could generate other counties; also we could generate grids covering more than one county. Ideally we would have a grid covering the whole Maine coast, but noone dares work with such large datasets simply because there is no software that can handle it (the exception is LAStools which can handle large data, and can generate very large grids from Lidar data, but the modules that do this are not open sources). All in all, this is an opportunity to develop an efficient algorithm, customize it and parallelize it. The problem of flooding is not immediately parellizable (like the viewshed problem) however some simple data partition strategies can be explored and all in all coming up with an approach will be a great problem.
SLR+BFE: An additional issue is to consider the impact of storm waves in addition to SLR flooding. The flooding due to waves is given as a BFE grid: this is a grid of the same size as the elevation grid, its value at point (i,j) is the height of waves at point (i,j). The BFE grid is zero at sea and inland, and has non-zero values only along the coast (see picture below). The BFE grid gives the extent of the current flooding (note: do we know anything about how it is calculated? is it based on historical flood data?), without taking SLR into account. A picture of the BFE grid for Southport is below (you can find it in the folder with SLR papers and data).
The goal is to "add" BFE to SLR and model flooding with both---this will give the flood zones in the future, when in addition to storm waves there will be sea-level rise.
If you want to work on this problem, I see several possibilities:
Test data: the island Southport in Maine, provided by Eileen Johnson. Data is here.
Relevant links: SLR papers
The problem is the following: Given a flow direction (FD) grid, compute a (recursive) watershed partition of the terrain.
It is important that you work with a complete FD grid, that is, one which routes flow on flat areas and routes flow out of sinks. The process of finding the sinks in the terrains and simulating flooding is tedious to implement, so you can skip this and use FD grids generated by GRASS GIS. If you want to run GRASS GIS, it is available on dover; I generated a couple of test FD grids and uploaded them here.
Given the FD grid, the process of computing a watershed partition is the following:
Doing this process once will give a partition into 9 watersheds. The output shoudl be a grid the same size as the terrain, where each point in the grid is labeled with a number 1 through 9, corresponding to the watershed it is in.
The basic function is this project will be a function to determine the watershed of an arbitrary point p. As a reminder, the watershed of a point p is all the grid points that flow to p.
Computing these 9 watersheds will be a good project.
As a refinement, you can repeat the process inside each watershe. This way you can find 9 sub-watersheds inside each of the 9 watersheds. For example the sub-watersheds of watershed 1 will be numbered 11,12,...19. The sub-watersheds of watershed 2 will be numbered 21, 22, 23,...,29; and so on.
Test data: You will find some test FD grids here. The DEMs are stored in teh standard location on dover dover:/mnt/research/gis/DATA/DEM/
Relevant links: watershed papers
The problem is the following: Given an elevation grid, compute the total viewshed grid. This is a grid where the value of point (i,j) is the size (that is, number of visible points) of the viewshed of (i,j).
Thus computing the total viewshed entails computing a viewshed for each single point in the terrain as a viewpoint. This process is computationally intensive and for example on Kaweah dataset (1 million points) it took 42 hours with one core. Here is a picture of the total viewshed for Kaweah:
The goal in this project is to parallelize this computation and perform a detailed and careful inverstigation of its performance. You will run experiments with various number of threads and measure the speedup; and you will attempt to explain why the speedup flattens out by finding a way to measure the running times of each thread, and thus the overall load balancing. You will experiment with ways to improve the load balancing among threads. All projects need to be accompanied by reports, but for this project in particular, since the bulk of the work is in the experimental evaluation, I expect a report that could be submitted to a conference.
Test data: See DEMs in dover:/mnt/research/gis/DATA/DEM/.
Parallel total viewshed:
lasgrid : is a tool that reads LIDAR from LAS/LAZ/ASCII and grids them onto a raster. The most important parameter ‘-step n’ specifies the n x n area that of LiDAR points that are gridded on one raster cell (or pixel). The output is either in BIL, ASC, IMG, TIF, PNG, JPG, XYZ, FLT, or DTM format. The tool can raster the ‘-elevation’ or the ‘-intensity’ of each point and stores the ‘-lowest’ or the ‘-highest’, the ‘-average’, or the standard deviation ‘-stddev’. Other gridding options are ‘-scan_angle_abs’, ‘-counter’, ‘-counter_16bit’, ‘-counter_32bit’, ‘-user_data’, ‘-point_source’, and others. For more details and other options not mentioned here see the README file.Here's the more detailed README for lasgrid.
lasgrid has many options for the output; your output should be and ASC grid, same format as the one you've used so far. Assume you start with a Lidar dataset in ASC form (as generated by las2txt). Allow the user to raster the lowest, or highest, or average point (and perhaps other). By default the grid shoudl cover the entire extent of the bounding box of the lidar dataset. Experiment with various step sizes and especially when the resolution of the grid gets close to the resolution of the Lidar data. If there are grid cells that dont contain any lidar points, then the resolution requested for the output grid is too high. You will investigate what's the highest resolution grid for a particular dataset.
A nice challenge here is to make your algorithm work on very large Lidar data where neither the Lidar data nor the grid actually fit in memory.
Test data: Check out Lidar data in dover:/mnt/research/gis/DATA/LIDAR/. There's also Lidar data that comes with LAStools.
Papers/links:
For example, your code could take on the command line the name of the lidar dataset to be simplified, the desired error threshold, and the name of the output file where you;ll write the esulting TIN. Your code should simplify the lidar data, time it, print a summary, and then render the TIN. The function to simplify should be separated from the render and timed. The summary should include how many points are left in the TIN (both in absolute value and percentage of the number of points in the input grid), and total time for simplification. The time for simplification should not include the time to read the lidar data into memory, or to write the TIN to a file. In summary, one should expect to see the following on the screen when running your program:
./simplify lidar.txt 10 lidar.10.tin reading lidar.txt in memory...done. total xxx seconds. --------- starting simplification n=184552. ... done. n'=2019 (1.09% of 184552) total time xx seconds --------- writing TIN to file lidar.10.tin
The error epsilon on the command line should be interpreted as an absolute value. Example: the command above produces a TIN that is within a distance of 10 (units) from the lidar data. The units here are the same as the units used for height in lidar.txt.
When run with epsilon=0, your program should eliminate all the flat areas. If there is no flat area then running with epsilon=0 will not eliminate any points.
Test data: Check out Lidaar data in dover:/mnt/research/gis/DATA/LIDAR/. There's also Lidar data that comes with LAStools.
Relevant links:
Oviously doing all three will be too ambitious for a term project. Just pick a part that looks interesting to you. For example, you could work on finding the ground. Or, you could run lasground to find the ground, and focus on finding the vegetation, or the buildings. This is worth exploring if you are interested in vision, because finding roofs entails computing some sort of planarity estimate for a point and its neighborhood. If you had an image instead of a grid, the starting point would be to compute the Sobel operator that estimates first derivatives an thus identifies sharp edges. Here you have a Lidar dataset that stores actual heights not colors, so the approach is different, but the spirit the same: identify how the terrain looks around a point in order to see if it's part of a planar roof. This assumes that roofs are planar, but hey you have to start somewhere and make some assumptions. Roofs that are not planar will need to be identified in a different way. If you want to chose this project, we have some datasets from Eileen where you can test it, check dover:/mnt/research/gis/DATA/LIDAR/Boothbay_Harbor.
Test data: Check out Lidar data in dover:/mnt/research/gis/DATA/LIDAR/. You will see a folder Lidar_for_Northeast that contains Lidar data for Maine.
Links: