generic portals for science infrastructure (gpsi)...generic portals for science infrastructure...

15
Generic Portals for Science Infrastructure (GPSI) Thomas D. Uram, Michael E. Papka, Mark Hereld, Michael Wilde Argonne Na8onal Laboratory, University of Chicago Computa8on Ins8tute

Upload: others

Post on 24-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Generic Portals for Science Infrastructure(GPSI)

Thomas  D.  Uram,  Michael  E.  Papka,  Mark  Hereld,  Michael  Wilde

Argonne  Na8onal  Laboratory,  University  of  Chicago  Computa8on  Ins8tute

Page 2: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

2

GPSIGeneric Portals for Science InfrastructureApproach Simplify  execu2on  and  data  management  on  

compute  resources Support  reproducibility  of  results  through  

execu2on  and  data  provenance   Enable  collabora2on  among  researchers  working  

on  a  science  campaign Capture  details  of  scien2sts’  idiosyncra2c  

workflows

Research  Ques2ons Effec2ve  scheduling  of  jobs  across  resources Efficient  data  staging  to  and  from  compute  

resources Automated  data  analysis  and  feature  extrac2on Data  filtering  for  remote  analysis  

View  of  job  details,  including  files  and  images  produced

View  of  job  history

Page 3: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

GPSI: Technologies

Django  core  (hence  Python)– Django  is  widely  used  in  industry,  and  is  the  most  popular  Python  web  

framework– Python  is  widely  used  in  the  scien8fic  community– Uses  a  variety  of  database  backends  (MySQL,  Sqlite)

Collec2on  of  Django  apps– Apps  exist  for  (nearly)  anything  one  needs

GPSI  is  one  more  Django  app– Can  remain  small,  focus  on  only  the  gateway-­‐related  func8onality

Uses  SwiM  for  describing  workflow  and  running  jobs GPSI  exposes  opportuni2es  for  customiza2on  by  science  domains

– User-­‐driven  gateways

3

Page 4: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

GPSI: Overall Flow

Introduce  data Introduce  applica/ons Schedule  jobs View/Analyze  results Reproduce  results Download Share

4

Page 5: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Compute Resource Setup

Mechanisms  supported  by  Swi?  – File  transfer:  grid?p,  ssh,  hFp– Execu/on:  pbs,  gram,  sge,  ssh,  ec2,  FutureGrid,  ...

Have  run  on  a  variety  of  resources  – Ranger,  Steele,  Blacklight  @  TeraGrid  XSEDE

– Compute  machines  at  Argonne,  including  Fusion  cluster

– PADS  at  UChicago– Third-­‐party  resources

Administra/vely  controlled  (for  5

Page 6: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Credential Management

Creden/als  managed  by  user  and  associated  with  par/cular  resources

Creden/al  types  include– ssh

– MyProxy• MyProxy  server  host,  port,  

username  and  password• Creden/al  renewed  as  needed

– MyProxy  OAuth• Avoids  collec/on  of  MyProxy  

username  and  password  by  gateway

• Have  currently  integrated  pre-­‐produc/on  TeraGrid  MyProxy  OAuth  support

6

Page 7: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

7October  7,  2011 6

Introduce Data

Upload  relevant  files,  or  ingest  on  server-­‐side Preview files inline

Search across your files and those of

your collaborators

Page 8: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Introduce Applications

Upload  relevant  files

Applica8on  defini8on  relies  on  required  applica8ons  residing  on  compute  resource

Edit  SwiW  script  within  GPSI

8

Edit, duplicate, run, delete jobs

Page 9: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Swift

Language  for  parallel  scrip8ng

Execu8on  engine  with  interfaces  to  various  schedulers

9

# simple script to execute hostname N timestype file;

# Swift app definition; each call will schedule a job(file t)do_hostname() { app { hostname stdout=@filename(t); }}foreach i in [0:50] { file outfile <concurrent_mapper;prefix="stdout",suffix=".txt">; outfile = do_hostname();}

For info on Swift, see http://www.ci.uchicago.edu/swift

Page 10: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Execute Jobs

Web  form  generated  by  introspec8ng  SwiW  script,  including  default  values

Uses  server-­‐side  proxy  cer8ficate  for  single  sign  on

10

Page 11: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Generate, view, download results: Collaborate

Well-­‐known  data  types  are  presented  graphically  among  job  output

Applica/ons  control  which  job  outputs  are  included  and  how  they’re  represented– XML  designa/on  of  output  files  to  

include  – XSLT  transform  of  file  input  to  

representa/ve  HTML

Users  view  each  other’s  results,  can  run  each  other’s  apps

11

Page 12: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

provenance

Job  details  captured  in  one  place– input,  op8ons,  compute  resource,  creden8al

Jobs  can  be  rerun  from  within  GPSI,  using  the  original  scenario

12

Page 13: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Where we’re using it

SmartGrid

Materials

Phylogene8cs

13

Triple-store for metadata integration

and searching

Page 14: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Future Work

Integrated  analysis

Dynamic  provisioning

SCS  integra8on

Globus  Online  integra8on

Data  cataloging

14

Page 15: Generic Portals for Science Infrastructure (GPSI)...Generic Portals for Science Infrastructure (GPSI) ThomasD.Uram,"Michael"E."Papka,"Mark"Hereld,"Michael"Wilde Argonne"Naonal"Laboratory,"University"of"Chicago"Computaon"Ins8tute

Acknowledgements

Michael  E.  Papka,  Mark  Hereld,  Joe  Insley,  Eric  Olson,  and  the  Futures  Laboratory  at  Argonne  Na8onal  Laboratory

Mike  Wilde  and  the  SwiW  team Nancy  Wilkins-­‐Diehr,  Suresh  Marru,  TeraGrid/XSEDE  Gateways  

program

Jim  Basney,  Jeff  Gaynor  for  MyProxy  OAuth NSF  Award  SCI-­‐0503697 XSEDE  Science  Gateways  program

15