gis in the cloud: implementing a web map service on google app engine jon blower reading e-science...
TRANSCRIPT
GIS in the cloud: implementing a Web Map Service on Google App
Engine
Jon BlowerReading e-Science Centre
University of ReadingUnited Kingdom
http://code.google.com/p/gae-wms/
Scalability is a major concern
• Web Map Services are quite “heavy” on the server
• Many infrastructures now moving into operational phase
• How to serve many simultaneous clients?– Tile caching is a widely-used option, but reduces
flexibility
• Load is typically “bursty”, so we’d like to be able to scale the back end up and down
Google App Engine overview• Virtual application-hosting environment
– Python, Java (servlets)• Machines are brought up and taken down
automatically• Built-in services, including:
– User authentication– Distributed memcache– Distributed persistent store– Image transformation
• Free within certain quotas
Aims of this project
• Develop a fully-functional GAE WMS for high-res global raster data– NASA Blue Marble composite image
• Efficient for tiling clients, but supports untiled access too
• Supports multiple coordinate reference systems• Use Java FOSS• Use resources efficiently to stay within free
quotas
Development challenges
• Much harder than anticipated! (see here):• Coding restrictions
– No local file output– Can’t spawn threads– Can’t access some Java packages (e.g. most of
java.awt, all of javax.imageio)– Limited RAM
• Deployment issues– Uploading data
Testing• Three “modes”:
– Fully-dynamic (all images generated from scratch)– Self-caching (duplicate requests are served from cache)– Static tiles (all images pre-generated)
• All images 256x256 pixels• Apache JMeter scripts
– Many client threads, requesting images in random order from preselected list
– Single client machine • GAE is “black box” so can’t control all aspects of the
experiments
Results: Throughput
Fully-dynamic
Self-caching
Static tiles
“Service not available” errors increase with load
Some notes about quotas
• Outgoing bandwidth quota (1GB/day) runs out fastest– Hence serving JPEGs is more cost-effective than
PNGs– Can serve 100,000 256x256 JPEGs per day for free
• But it’s easy to code in such a way that some per-minute quotas are also exceeded– E.g. quota on output from the distributed data
store• Quotas can be increased!
Conclusions 1
• Successfully implemented full WMS for raster images
• Significant usage at zero running costs• Performance and scalability acceptable for
many apps– But latency spikes are an issue
• Testing with distributed clients would be instructive
Conclusions 2: further potential?
• Hard to host lots of images in same instance using our method– Relies on storing data in local files, with a tight quota
• Restrictions on Java servlet environment make it hard to run standard software stacks– E.g. GeoServer
• Expansion to vector dataset is probably hard– Would need a spatial index on top of the distributed
data store
Thank you!
All code, full paper, results and more details about the experiments:
http://code.google.com/p/gae-wms/