architectures for data commons (xldb 15 lightning talk)

8
Architectures for Data Commons: Lessons Learned from the GDC (Lightning Talk) Robert Grossman University of Chicago Open Cloud ConsorCum May 19, 2015 XLDB 2015

Upload: robert-grossman

Post on 27-Jul-2015

49 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Architectures for Data Commons (XLDB 15 Lightning Talk)

Architectures  for  Data  Commons:  Lessons  Learned  from  the  GDC  

(Lightning  Talk)  

Robert  Grossman  University  of  Chicago  

Open  Cloud  ConsorCum  

May  19,  2015  XLDB  2015  

Page 2: Architectures for Data Commons (XLDB 15 Lightning Talk)

Data  commons  co-­‐locate  data,  storage  and  compuCng  infrastructure,  and  commonly  used  tools  for  analyzing  and  sharing  data  to  create  a  resource  for  the  research  community.  

The  image  above  is  the  web  portal  for  the  NCI  Genomic  Data  Commons.    The  University  of  Chicago  is  collaboraCng  with  the  NaConal  Cancer  InsCtute  (NCI)  to  develop  the  NCI  Genomic  Data  Commons  (GDC).  The  GDC  is  being  developed  and  operated  with  NCI  funding  through  a  subcontract  from  Leidos  Biomedical  Research,  Inc.  at  the  Frederick  NaConal  Laboratory  for  Cancer  Research.    

Page 3: Architectures for Data Commons (XLDB 15 Lightning Talk)

Hospitals,  medical  research  centers  and  doctors  

Data  commons  containing    genomic  and  clinical  data.  

PaCents  

Output:  conCnuously  updated,  data-­‐driven,    analyCcs-­‐informed    discovery,  diagnosis  and  treatment.  

Future  State:  virtual  comprehensive  cancer  center.    

Page 4: Architectures for Data Commons (XLDB 15 Lightning Talk)

This  is  a  visualizaCon  of  the  Cme  required  to  download  data,  process  it,  and  upload  it  into  the  GDC  (horizontal  axis)  vs  the  size  of  the  data  (verCcal  axis,  ranging  from  GB  (10^9)  to  TB  (10^12)).  

Page 5: Architectures for Data Commons (XLDB 15 Lightning Talk)

Object-­‐based  storage  with  S3  compaCble  interface  

Scalable  light  weight  workflow  

Community  data  products,  including  data  harmonizaCon  

Data  submission  portal  

Data  access  portal  

Co-­‐located  “pay  for  compute”  

Long  running  middleware  services  (Digital  ID  services,  metadata  services,  high  perf.  transport)    

Devops  supporCng  virtualized  environments  &  containers    (OpenStack  VMs,  Docker  containers,  scheduling)    

APIs  for  data  access    

APIs  for  data  submission  

Database  services  

Page 6: Architectures for Data Commons (XLDB 15 Lightning Talk)

At  What  Scale?  •  Data  centers  are  someCmes  divided  into  “pods,”  which  can  be  built  out  as  needed.  

•  A  reasonable  scale  for  a  data  commons  is  one  of  these  pods  (“cyberpod”).  

•  Let’s  use  the  term  “datapod”  for  the  data  &  analyCc  cyber  infrastructure  that  scales  to  a  cyberpod.  

•  Think  of  as  the  scale  out  of  a  database  to  a  cyberpod.  

Pod  A   Pod  B  

Page 7: Architectures for Data Commons (XLDB 15 Lightning Talk)

We  are  developing  an  open  source  sodware  stack  for  data  commons  that  scales  to  cyberpods.  

Page 8: Architectures for Data Commons (XLDB 15 Lightning Talk)

QuesCons?  

8  

For  more  informaCon:          rgrossman.com        @bobgrossman        opensciencedatacloud.org        cdis.uchicago.edu