lars arge 1/14 a a r h u s u n i v e r s i t e t department of computer science efficient handling...

14
A A R H U S U N I V E R S I T E T Department of Computer Science Lars Arge 1/14 Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University of Aarhus and Duke University September 22, 2006

Post on 20-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

1/14

Efficient Handling of Massive (Terrain) Datasets

Professor Lars Arge

University of Aarhus and Duke University

September 22, 2006

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

2/14

Who am I?

PhD in Computer Science 1996 from AU Research career at Duke University in US

Post doc 1996……Professor 2006 Ole Rømer Professorship AU 2004

Research: Efficient algorithms and data structures for massive datasets Lots of theoretical work… … but also practical work, especially on GIS and terrain data

Funded by e.g. US ARO and NSF, Danish basic and strategic research councils, private companies (COWI, DHI)

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

3/14

Massive Data Algorithmics

Massive data being acquired/used everywhere Storage management software is billion-$ industry

Science is increasingly about mining massive data (Nature 2/06)

Examples (2002): Phone: AT&T 20TB phone call database, wireless tracking Consumer: WalMart 70TB database, buying patterns WEB: Google index 8 billion web pages Geography: NASA satellites generate Terrabytes each day

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

4/14

Terrain Data

New technologies: Much easier/cheaper to collect detailed data Previous ‘manual’ or radar based methods

−Often 30 meter between data points−Sometimes 10 meter data available

New laser scanning methods (LIDAR)−Less than 1 meter between data points−Centimeter accuracy (previous meter)

Denmark: ~2 million points at 30 meter (<<1GB) ~18 billion points at 1 meter (>>1TB) COWI A/S in the process of scanning DK NC scanned after Hurricane Floyd in 1999

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

5/14

Massive data = I/O-Bottleneck

I/O is often bottleneck when handling massive datasets Disk access is 106 times slower than main memory access!

Disk systems try to amortize large access time transferring large contiguous blocks of data

Need to store and access data to take advantage of blocks!

track

magnetic surface

read/write armread/write head

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

6/14

I/O-efficient Algorithms

Taking advantage of block access very important Traditionally algorithms developed without block considerations I/O-efficient algorithms leads to large runtime improvements

data size

runnin

g t

ime

Normal algorithm

I/O-efficient algorithm

Main memory size

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

7/14

R

A

M

Scalability: Hierarchical Memory

Block access not only important on disk level Machines have complicated memory hierarchy

Levels get larger and slower Block transfers on all levels

L

1L

2

data size

runnin

g t

ime

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

8/14

My Research Work

Theoretically I/O- (and cache-) efficient algorithms work Data structures, computational geometry, graph theory, … Focus on Geographic Information Systems problems

Algorithm engineering work, e.g TPIE system for simple, efficient,

and portable implementation ofI/O-efficient algorithms

Software for terrain data processing LIDAR data handling Terrain flow computations

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

9/14

Example: Terrain Flow

Terrain water flow has many important applications Predict location of streams, areas susceptible to floods…

Conceptually flow is modeled using two basic attributes Flow direction: The direction water flows at a point Flow accumulation: Amount of water flowing through a point

Flow accumulation used to compute other hydrological attributes: drainage network, topographic convergence index…

7 am 3pm

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

10/14

Terrain Flow Accumulation

Collaboration with environmental researchers at Duke University Appalachian mountains dataset:

800x800km at 100m resolution a few Gigabytes On ½GB machine:

ArcGIS: Performance somewhat unpredictable Days on few gigabytes of data Many gigabytes of data…..

Appalachian dataset would be Terabytes sized at 1m resolution

14 days!!

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

11/14

Terrain Flow Accumulation: TerraFlow We developed theoretically I/O-optimal algorithms TPIE implementation was very efficient

Appalachian Mountains flow accumulation in 3 hours!

Developed into comprehensive software package for flow computation on massive terrains: TerraFlow Efficient: 2-1000 times faster than existing software Scalable: >1 billion elements! Flexible: Flexible flow modeling (direction) methods

Extension to ArcGIS

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

12/14

LIDAR Terrain Data Work

Now “pipeline” of software for handling terrain (LIDAR) data Points to DEM (incl. breaklines) DEM flow modeling (incl “flooding”, “flat” routing, “noise” reduce) DEM flow accumulation (incl river extraction) DEM hierarchical watershed computation

All work for both grid and TIN DEM’s Capable of handling massive datasets

Test dataset: 400M point Neuse river basin (1/3 NC) (>17GB)

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

13/14

Examples of Ongoing Terrain Work

Terrain modeling, e.g “Raw” LIDAR to point conversion (LIDAR point classification)

(incl feature, e.g. bridge, detection/removal) Further improved flow and erosion modeling (e.g. carving) Contour line extraction (incl. smoothing and simplification) Terrain (and other) data fusion (incl format conversion)

Terrain analysis, e.g Choke point, navigation, visibility, change detection,…

Major grand goal: Construction of hierarchical (simplified) DEM where

derived features (water flow, drainage, choke points)are preserved/consistent

A A R H U S U N I V E R S I T E T

Department of Computer Science

Lars Arge

14/14

Thanks

Lars [email protected]

Work supported in part by US National Science Foundation ESS grant EIA–98070734, RI grant EIA–

9972879, CAREER grant EIA–9984099, and ITR grant EIA–0112849 US Army Research Office grants W911NF-04-1-0278 and DAAD19-03-1-

0352 Ole Rømer Scholarship from Danish Science Research Council NABIIT grant from the Danish Strategic Research Council