big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_foster.pdf! ! big processfor!big!data!...

23
www.ci.anl.gov www.ci.uchicago.edu Big process for big data Process automa9on for datadriven science Ian Foster Computa9on Ins9tute Argonne Na9onal Laboratory & The University of Chicago Talk at 2 nd Workshop on Understanding Climate Change from Data, Minneapolis, August 6, 2012

Upload: others

Post on 17-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

   

www.ci.anl.gov  www.ci.uchicago.edu  

Big  process  for  big  data  Process  automa9on  for  data-­‐driven  science                      

Ian  Foster  Computa9on  Ins9tute  Argonne  Na9onal  Laboratory  &  The  University  of  Chicago    Talk  at  2nd  Workshop  on  Understanding  Climate  Change  from  Data,  Minneapolis,  August  6,  2012  

Page 2: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

2  

I’ll  talk  about  two  things  

•  Center  for  Robust  Decision  making  on  Climate  and  Energy  Policy  (RDCEP)  –  www.rdcep.org  

•  Research  data  management  as  a  service:  Globus  Online  –  www.globusonline.org    

Page 3: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

3  

RDCEP  –  www.rdcep.org  

•  Research  to  improve  models,  confront  uncertainty,  and  advance  computa9onal  methods  

•  Development  of  open  tools  for  modeling,  es9ma9on,  uncertainty  characteriza9on,  and  related  tasks  

•  Educa4on  and  research  experience  for  high-­‐school  students  and  teachers,  undergrads,  grads,  and  postdocs  

•  Products  for  decision  makers  and  other  stakeholders  

 

3  

Page 4: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

4  

Some  ini9al  projects  

•  CIM-­‐EARTH  modeling  framework  (Choi,  Munson)  •  Stochas9c  dynamic  models  (Cai,  Judd,  Lontzek)  •  Characterizing  parametric  uncertainty  (Ellio_,  Munson)  •  Carbon  leakage  (Ellio_,  Foster,  Kortum,  Munson,  Weisbach)  •  Distribu9onal  impacts  of  climate  change  (Ellio_,  Fullerton,  Rao)  •  Land  use  decisions  under  uncertainty  (Hertel,  Judd,  Steinbuks)  •  Robust  decision  making  for  energy  (Hansen,  Munson,  Sanstad)  •  Climate  model  emula9on  (McInerney,  Moyer,  Stein)  •  Uncertainty  in  downscaling  for  impacts  studies  (Moyer  et  al.)  •  Par9al  equilibrium  land  use  change  for  biofuels  (Ellio_  et  al.)  •  Analysis  of  Illinois  Renewable  Porbolio  Standard  (Moyer  et  al.)  •  Social  Cost  of  Carbon  studies  (Moyer,  Weisbach,  et  al.)  

4  

Page 5: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

5  

Climate  impacts  in  agriculture  via  gridded    crop  simula9ons  with  the  pDSSAT  model  Example:  1980-­‐2009  rainfed,  fully  fer9lized  corn  yields  at  5’                Median                    Standard        devia9on  

Page 6: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

6   0 45 90 Bu/acre

-60 0 60 Bu/acre

Reanalysis  vs.  interpolated  observa9ons  NCEP  CFSR  Temps;  CPC  Precip;  SRB  Solar  

•  To  characterize  the  uncertain9es  that  are  expected  from  performing  these  simula9ons  globally  with  reanalysis  products,  we  compare  the  NCEP  CFSR  with  interpolated  observa9on  products  like  CPC  and  SRB  

•  For  rainfed  maize,  pure  reanalysis-­‐based  simula9ons  tend  to  under-­‐predict  yields  across  a  wide  swath  of  the  US  corn  belt  rela9ve  to  simula9ons  using  observa9ons.  

•  They  tend  to  over-­‐predict  over  most  of  the  area  East  of  the  Mississippi,  most  notably  along  SE  coast  and  throughout  Florida  

Page 7: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

7  

Aggrega9on  and  Valida9on  •  Simulated  corn  yields  aggregated  to  county  and  state  level  and  plo_ed  

against  USDA  survey  data.  –  1980-­‐2009;  60423  data  points  from  2373  coun9es;  1225  from  41  states.  ─  RMSE  and  R2  between  simulated  and  observed  aggregate  yields:  

o  County:  26  bu/A  (25%  of  sample  mean),  R2  =  0.60  o  State:  18.6  bu/A  (15.7%  of  sample  mean),  R2  =  0.73    o  Na9onal:  13.6  bu/A  (10.8%  of  sample  mean)  

Accuracy  improves  as  we  aggregate  from  county  to  state  to  na4onal  level  

County   State  

Page 8: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

8  

I’ll  talk  about  two  things  

•  Center  for  Robust  Decision  making  on  Climate  and  Energy  Policy  –  www.rdcep.org  

•  Research  data  management  as  a  service:  Globus  Online  –  www.globusonline.org    

Page 9: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

9  

The  climate  data  problem  

•  Climate  data  (observa9ons  +  simula9ons)  are  highly  distributed  

•  Climate  data  are  big  and  gemng  bigger  

•  Mundane  tasks  associated  with  preparing  for  and  performing  analyses  take  more  &  more  9me  

 

Page 10: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

10  

Process  automa9on  for  science  

Run  experiment  Collect  data  Move  data  Check  data  

Annotate  data  Share  data  

Find  similar  data  Link  to  literature  Analyze  data  Publish  data  

 Research  IT  as  a  service  

 

Page 11: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

11  

Process  automa9on  for  science  

Run  experiment  Collect  data  Move  data  Check  data  

Annotate  data  Share  data  

Find  similar  data  Link  to  literature  Analyze  data  Publish  data  

 Research  IT  as  a  service  

 

Page 12: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

12  

Eppur  si  muove  (“and  yet  it  moves”)  …  

•  Galileo  said  this  about  the  earth  (the  earth  is  not  the  center  of  the  universe)  

 •  Same  observa9on  holds  about  data,  which  is  oqen  in  the  “wrong  place”  for  various  reasons  

Page 13: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

13  

Globus  Online:  Data  transfer  as  SaaS  •  Reliable  file  transfer.  

– Fire-­‐and-­‐forget  – Automa9c  fault  recovery  – High  performance  – Across  mul9ple  security  domains  

•  No  IT  required.  – No  client  soqware  install  – New  features  automa9cally  available  

– Consolidated    support  and  troubleshoo9ng   Works  with  exis9ng  GridFTP  servers;  also  Globus  Connect  

Page 14: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

14  

Page 15: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

15  

Page 16: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

16  

Page 17: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

17  

Page 18: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

18  

Page 19: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

19  

Page 20: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

20  

440  GB  @  60  MB/sec    

2  hours  

Page 21: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

21  

Globus  Transfer  to  date  •  In  18  months  

–  5,000  users  –  6  PB  moved  –  500M  files  –  99.9%  up9me  

•  Broad  adop9on  –  Experimental  facili9es  –  Supercomputers  –  Campuses  –  Individuals  –  Projects  

Page 22: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

www.ci.anl.gov  www.ci.uchicago.edu  

22  

Commercial  storage  service  

provider  

Na9onal  research  center  

Campus  compu9ng  center  

Globus  Storage:  For  when  you  want  to  …  

•  Place  your  data  where  you  want  

•  Access  it  from  anywhere  via  different  protocols  

•  Update  it,  version  it,  and  take  snapshots  

•  Share  versions  with    who  you  want  

•  Synchronize  among  loca9ons  

Globus  Storage  volume  

Globus  Transfer,  HTTP/REST,  Desktop  sync  

Page 23: Big processfor! big!dataclimatechange.cs.umn.edu/docs/ws12_Foster.pdf! ! Big processfor!big!data! Process!automaon!for!datadriven!science!!!!! Ian!Foster! Computaon!Ins9tute! Argonne!Naonal!Laboratory

   

www.ci.anl.gov  www.ci.uchicago.edu  

Thank  you!  

rdcep.org  globusonline.org    [email protected]  [email protected]