scidb @ nersc · scidb, parallel processing without parallel programming everything+in+arrays+ –...
TRANSCRIPT
![Page 1: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/1.jpg)
Yushu
SciDB @ NERSC
-‐ 1 -‐
![Page 2: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/2.jpg)
Array Like Science Data "– More common than you think
-‐ 2 -‐
![Page 3: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/3.jpg)
SciDB, parallel processing without parallel programming
Everything in Arrays – Locate an element at O(constant)
– Can be very sparse – Best for machine/simula9on generated structure data
– Good for metadata too • Query-‐like language, auto-‐paralleliza:on
• Do Calcula:ons inside the DB
-‐ 3 -‐ Yushu Yao
![Page 4: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/4.jpg)
NERSC SciDB Testbed
• Partner up with Science Teams – Hold their hands to load the 1st batch of data and implement the 1st major analysis opera9on
• 15+ Science Projects • Complicated Workflows and Algorithms • Mul:ple Science Domains: – Astronomy, Climate, Bio-‐imaging, Genomic
• Mul:ple Types of Data – Spectrums, Images, Time Series
• Large amount of data – Normally 100GB-‐1TB, some has 5+TB
-‐ 4 -‐
![Page 5: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/5.jpg)
Types of Data Suitable for SciDB
• Imaging data: digital pictures from light sources or telescopes
• Time series data collected from sensors • Spectral data • Graph-‐like structures that represent rela:ons between en::es (sparse matrix)
-‐ 5 -‐
![Page 6: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/6.jpg)
Examples
-‐ 6 -‐
![Page 7: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/7.jpg)
MetAtlas (LIQUID CHROMATOGRAPHY-MASS SPECTROMETRY)
-‐ 7 -‐
![Page 8: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/8.jpg)
MetAtlas (LIQUID CHROMATOGRAPHY-MASS SPECTROMETRY)
-‐ 8 -‐
![Page 9: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/9.jpg)
Some Primitives
-‐ 9 -‐
Aggregate along one dimension Aggregate by re-‐gridding
![Page 10: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/10.jpg)
Benchmark of MetAtlas Workload
-‐ 10 -‐
![Page 11: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/11.jpg)
DustOff Workflow
-‐ 11 -‐
![Page 12: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/12.jpg)
DustOff In SciDB
-‐ 12 -‐
Collec9on of Spectrums
![Page 13: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/13.jpg)
DustOff Scaling
-‐ 13 -‐
![Page 14: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/14.jpg)
Strength/Weaknesses of SciDB
-‐ 14 -‐
Analysis
Management
Usability
Sharing
Good R/Python Binding Good Build-‐in analy9cs Rela9vely easy to extend in C++
Easy to put behind a webpage Need manual access control
Good for subselec9ng/filtering Need to load data in (duplicate)
![Page 15: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/15.jpg)
-‐ 15 -‐
![Page 16: SciDB @ NERSC · SciDB, parallel processing without parallel programming Everything+in+Arrays+ – Locate"an"elementat O(constant) – Canbeverysparse – Bestfor"machine/simulaon"](https://reader034.vdocument.in/reader034/viewer/2022050107/5f453f0fc144ff35ff676515/html5/thumbnails/16.jpg)
New SciDB Service at NERSC
• Dedicated Servers • Produc:on Ready • To Request SciDB: – h\ps://www.nersc.gov/users/science-‐gateways/science-‐database-‐request-‐form/
-‐ 16 -‐