the optirad platform: cloud-hosted ipython notebooks for collaborative eo data analysis and...
TRANSCRIPT
The OPTIRAD Pla-orm: Cloud-‐hosted IPython Notebooks for collabora?ve EO Data Analysis and Processing ESA EO Open Science 2.0 Conference 12-‐14 October 2015
Philip Kershaw (CEDA), John Holt (Tessella plc.) José Gómez-‐Dans, Philip Lewis (UCL)
Nicola Pounder, Jon Styles (Assimila Ltd.)
JASMIN (STFC/Stephen Kill)
Introduc?on
• OPTIRAD = OPTImisa?on environment for joint retrieval of mul?-‐sensor RADiances – Collabora?on: CEDA, UCL, Assimila Ltd, FastOpt and VU Amsterdam – Funded by ESA
• Overview of technical solu?on – Introduc?on to IPython (Jupyter) Notebook – Deployment on JASMIN-‐CEMS science cloud
• Make the case, IPython Notebook + Cloud = powerful combina?on for EO Open Science 2.0
OPTIRAD Goals
Address the challenge of producing consistent EO land surface informa?on products from heterogeneous EO data input:
Collabora?on: provide a collabora?ve research environment as a means to engender closer working between algorithm specialists, modellers and end users.
Compu?ng resources: processing at high spa?al and temporal resolu?ons with computa?onally expensive algorithms.
Usability and access: easy execu?on and development of exis?ng Python code and the provision of interac?ve tutorials for new users
IPython Notebook
• Provides Python kernels accessible via a web browser
• Sessions can be saved and shared • Trivial access to parallel processing
capabili?es – IPython.parallel (ipyparallel)
• IPython Jupyter Notebook • Support for other languages such as
R
• New JupyterHub allows mul?-‐user management of notebooks
• Gained trac?on as a teaching and collabora?ve tool
IPython Notebook + Cloud
• Cloud’s characteris?cs: – Broad network access, resource pooling, elas?city, scale – compute and
storage – Good fit for Big Data science applica?ons
• Cloud-‐hosted Notebook -‐ a model already demonstrated with public cloud services e.g. – Wakari, Azure, Rackspace
• Central hos?ng allows central management of socware packages
– no installa?on steps needed for the user • Algorithm prototyping environment next to Big Data
– Acts as a precursor to opera?onal processing services
Notebook: a user – applica?on perspec?ve
Support a spectrum of usage models
Diffe
rent
classes o
f user
Long-‐tail of science users è
Design and development considera?ons
• Host on JASMIN-‐CEMS – Data analysis facility and science cloud at Rutherford Appleton Lab, UK – Advantage of proximity to locally hosted EO and climate science datasets – Integra?on with environmental sciences community
• Lightweight development and deployment philosophy – Build on Open Source and community efforts to use what’s already available
• How to meet mul?-‐user support requirement?
– Buy off-‐the-‐shelf: run Wakari on JASMIN-‐CEMS pla-orm or – Try JupyterHub: mul?-‐user IPython Notebook solu?on or – Roll our own solu?on
• How to integrate parallel processing? – IPython.parallel (ipyparallel) Python API accessed via the Notebook
OPTIRAD JASMIN Cloud Tenancy
Docker Container
VM: Swarm pool 0 VM: Swarm pool 0
Deployment Architecture
JupyterHub
VM: Swarm pool 0
Docker Container
IPython Notebook
Kernel
Docker Container
IPython Notebook
Kernel
Kernel
Kernel Parallel Controller
Parallel Controller
VM: Swarm pool 0
VM: Swarm pool 0
VM: slave 0
Parallel Engine
Parallel Engine
Nodes for parallel Processing
Notebooks and kernels in containers
Swarm manages alloca?on of containers for notebooks
Manage users and provision of
notebooks
Swarm
Firewall
VM: shared services
NFS LDAP
Browser access
Conclusions + Next Steps
• Experiences from project delivery – Off-‐shelf solu?on using JupyterHub paid off – JupyterHub and Swarm was new but – Installa?on straigh-orward + opera?onally robust
• Challenges and future development – Extend use of containers for parallel compute – Challenge: managing cloud elas?city with both containers and host
VMs – Provide object storage – CEPH likely to be adopted – Expand from OPTIRAD pilot to wider user community – Deploy with toolboxes e.g. Sen?nels or CIS.
Demo . . .
• A tutorial on EO data assimila?on – Notebook blurs the tradi?onal separa?on between tutorial documenta?on and using the target system
– The two are one self-‐contained interac?ve unit J
Further informa?on
• OPTIRAD: – Op?misa?on Environment For Joint Retrieval Of Mul?-‐Sensor Radiances
(OPTIRAD), Proceedings of the ESA 2014 Conference on Big Data from Space (BiDS’14) hip://dx.doi.org/10.2788/1823
• JASMIN paper (Sept 2013) – hip://home.badc.rl.ac.uk/lawrence/sta?c/2013/10/14/
LawEA13_Jasmin.pdf – Cloud paper to follow soon
• Cloud-‐hosted JupyterHub with Docker for teaching: – hips://developer.rackspace.com/blog/deploying-‐jupyterhub-‐for-‐
educa?on/ • JASMIN and CEDA:
– hip://jasmin.ac.uk/ – hip://www.ceda.ac.uk
• @PhilipJKershaw