gis in the cloud: implementing a web map service on google app engine jon blower reading e-science...

GIS in the cloud: implementing a Web Map Service on Google App

Engine

Jon BlowerReading e-Science Centre

University of ReadingUnited Kingdom

http://code.google.com/p/gae-wms/

http://www.reading.ac.uk/godiva2

Scalability is a major concern

• Web Map Services are quite “heavy” on the server

• Many infrastructures now moving into operational phase

• How to serve many simultaneous clients?– Tile caching is a widely-used option, but reduces

flexibility

• Load is typically “bursty”, so we’d like to be able to scale the back end up and down

Cloud computing

Hardware

Operating System

Application container

Specific applications

Google App Engine overview• Virtual application-hosting environment

– Python, Java (servlets)• Machines are brought up and taken down

automatically• Built-in services, including:

– User authentication– Distributed memcache– Distributed persistent store– Image transformation

• Free within certain quotas

Aims of this project

• Develop a fully-functional GAE WMS for high-res global raster data– NASA Blue Marble composite image

• Efficient for tiling clients, but supports untiled access too

• Supports multiple coordinate reference systems• Use Java FOSS• Use resources efficiently to stay within free

quotas

Development challenges

• Much harder than anticipated! (see here):• Coding restrictions

– No local file output– Can’t spawn threads– Can’t access some Java packages (e.g. most of

java.awt, all of javax.imageio)– Limited RAM

• Deployment issues– Uploading data

http://code.google.com/p/gae-wms/wiki/DevelopmentChallenges

Blue Marble image courtesy of NASA Earth Observatory

Testing• Three “modes”:

– Fully-dynamic (all images generated from scratch)– Self-caching (duplicate requests are served from cache)– Static tiles (all images pre-generated)

• All images 256x256 pixels• Apache JMeter scripts

– Many client threads, requesting images in random order from preselected list

– Single client machine • GAE is “black box” so can’t control all aspects of the

experiments

Results: Throughput

Fully-dynamic

Self-caching

Static tiles

“Service not available” errors increase with load

Results: Latency

Fully-dynamic

Self-caching

Static tiles

Unpredictable latency spikes!

Some notes about quotas

• Outgoing bandwidth quota (1GB/day) runs out fastest– Hence serving JPEGs is more cost-effective than

PNGs– Can serve 100,000 256x256 JPEGs per day for free

• But it’s easy to code in such a way that some per-minute quotas are also exceeded– E.g. quota on output from the distributed data

store• Quotas can be increased!

Conclusions 1

• Successfully implemented full WMS for raster images

• Significant usage at zero running costs• Performance and scalability acceptable for

many apps– But latency spikes are an issue

• Testing with distributed clients would be instructive

Conclusions 2: further potential?

• Hard to host lots of images in same instance using our method– Relies on storing data in local files, with a tight quota

• Restrictions on Java servlet environment make it hard to run standard software stacks– E.g. GeoServer

• Expansion to vector dataset is probably hard– Would need a spatial index on top of the distributed

data store

Thank you!

[email protected]

All code, full paper, results and more details about the experiments:


mailto:[email protected]


gis in the cloud: implementing a web map service on google app engine jon blower reading e-science...

Documents