fast, distributed geoprocessing with scala, spark and geotrellis
DESCRIPTION
What got you hooked on geospatial? For me it was more than just maps – it was the ability to transform geographic data to see something new or shed light on some aspect of my environment. Whether you use GDAL, ArcGIS, GRASS or IDRISI, we have usually done this type of data transformation with a variety of desktop software tools. So why have these types of capabilities been relatively rare in web and mobile applications? Speed and scalability are two important factors. It has generally required too much time to calculate a viewshed, combine a stack of raster files into a weighted overlay, or generate slope and aspect from elevation data. Azavea has been working on this problem – fast, scalable geoprocessing – for several years. In 2012 we released a new open source project called GeoTrellis (http://geotrellis.io/), an open source framework for fast, distributed geoprocessing. GeoTrellis leverages the strong type system and functional programming style of the Scala language and the Spark and Akka frameworks. This talk will give an overview of GeoTrellis and how it can be integrated with web mapping tools to create online geoprocessing applications for stormwater modeling, education games, infrastructure prioritization, climate change, and transportation.TRANSCRIPT
@azavea
@rcheetham
21st Century Geoprocessing
with Scala and GeoTrellis
Robert Cheetham
B Corporation
• Civic/Social impact
• Donate share of profits
Research-Driven
• 10% Research Program
• Academic Collaborations
• Open Source
• Open Data
Use geodata to
do stuff that matters
Land
Water
People
Ian McHarg
Dana Tomlin
Idrisi
GRASS
advanced
spatial analysis
on the web
advanced
spatial analysis
on the web
3 Challenges
1. Performance & Scalability
Big Data – Cities
2. Large Data Sets – Digital City
2. Large Data Sets – Social Media
2. Large Data Sets - Science
3. User Interface
3. User Interface
3. User Interface
3. User Interface
3. User Interface
We can do better
• IO
• Geoprocessing Operations
• Distributed Processing
• Web Services
Real-time Processing
6183 x 4992 4598 x 4867
118 MB 86 MB
Cluster-style Processing
1770271 x 910139
5.8 TB
How does it work
On the shoulders of giants
LocationTech Community
Some changes coming
• Parallel operations across tiles
• Parallel execution of operations
• Basic cluster capabilities with
GeoTrellis v0.9:
+
• Sharding raster data across the cluster
• Caching operation results across cluster
• HDFS support
• Advanced Fault tolerance
• Advanced Scheduling
• ...
What's missing?
+
• Caches results in memory
• Ideal for iterative algorithms
• Significantly outperforms Hadoop
• Uses Hadoop's file system (HDFS)
+
What becomes possible?
Urban Forests
Urban Forests
Simulation Modeling
Sea Level Rise
Business Siting
Streaming Data
Counting Carbon
Digital Humanities
GeoTrellis Transit
Travelsheds
Crime Analysis and Forecasting
It’s the second Monday in October
and school is in session. There were 2
burglaries and 3 assaults yesterday.
The Maple Leafs are not playing this
evening. Six bars, three take-out
stores, and a high school are in the
neighborhood. The forecast is 9°C
with a 50% chance of rain this evening.
Where do you focus your 3 vehicles?
It’s the second Monday in October
and school is in session. There were 2
burglaries and 3 assaults yesterday.
The Maples Leafs are not playing this
evening. Six bars, three take-out stores,
and a high school are in the
neighborhood. The forecast is 9°C
with 50% chance of rain.
Where do you focus your 3 vehicles?
Data Science + Geography
Data Science + Geography
Faster is different…
Educational Games
New Devices and Displays
I am very excited
advanced
spatial analysis
on the web
advanced
spatial analysis
on the web
Land
Water
People
Simulation
Modeling
Forecasting
• Multi-band
• Temporal bands (climate)
• More operations
• Tile indexes
• GeoMesa collab.
• Simpler setup
• More integration points
What’s next?
GeoTrellis.io
Get Involved
Get Involved
Get Involved
IRC: #geotrellis on freenode
Use geodata to
do stuff that matters