nasa/ipac infrared science archive tatiana goldina, loi ly, trey roby, xiuqin wu
TRANSCRIPT
![Page 1: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/1.jpg)
Web-based 2d visualization with large data sets
NASA/IPAC Infrared Science Archive
Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu
![Page 2: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/2.jpg)
Larger Data Sets
2003
2MASS Point Source
Catalog
0.5 billion rows> 100 columns
2013
AllWISE Source Catalog
0.75 billion rows> 300 columns
![Page 3: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/3.jpg)
IRSA’s Firefly tri-view
Gum31, AllWISE Source Catalog, 0.5d search. Data are selected in each of the 3 views.
![Page 4: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/4.jpg)
Problem
Sky area: box with center 150.12, +2.21 and length 5400 arcsec.
Catalog Rows, Columns (short form default)
Space on disk(ascii IPAC Table)
AllWISE Source Catalog
30,000 rows, 47 columns
13MB / 9B per cell
COSMOS Cassata morphology Catalog
230,000 rows,15 columns
62MB / 18B per cell
Spitzer Source List 250,000 rows, 148 columns
416MB / 11B per cell
Table covers one page at a time.Image overlay and plot should cover all rows.
How do we visualize this much data?
![Page 5: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/5.jpg)
OverplottingPoints on top of each other
- hard to distinguish
- hard to interpret
- can be aggregated
Plot area: 400 x 400 px2
Symbol size: 5 x 5 px2
160,000 px2/ 25 px2 = 6400
230,000 catalog rows are plotted with 5960 square symbols
![Page 6: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/6.jpg)
Binning Data aggregation technique
Used by statistical packages (R or SDSS)
2-d histogram; shade represent Np in
bin
Outlier preserving
![Page 7: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/7.jpg)
Color-Color Diagram
Color-color diagram created from AllWISE Source Catalog. 1 degree cone search. Lockman Hole. 46,475 data points from are represented by 1,598 bins.
![Page 8: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/8.jpg)
Color-Color Diagram (2)
Same diagram, different shading scheme. Darker – 3.1 times more points.
![Page 9: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/9.jpg)
Binning – calculation
x:y – aspect ratioNbins – maximum number of bins Nx = (int)sqrt( Nbins * [x:y] )Ny = (int)sqrt( Nbins / [x:y] )
binsizex = (xmax – xmin) / Nx + padx
binsizey = (ymax – ymin) / Ny + pady
![Page 10: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/10.jpg)
Server-side vs. Client-side SERVER SIDE CLIENT SIDE
Reduces transferred data size
Used for larger tables (> 30,000 rows)
Reduces rendered data size
Common plot operations – zoom, select – do not require server call
Used for smaller tables (up to 30,000 rows)
![Page 11: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/11.jpg)
Preparing data for visualization
1.
• Retrieve data from low-level query and data service
2.• Apply dynamic [current table] filters
3.• Apply current sorting order
4.• Aggregate data for visualization
stream table processing – one row at a time
cache intermediate results fix plot aspect ratio
Polic
ies
![Page 12: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/12.jpg)
Binning: implications for tri-view
Filtering from image overlay. How to find matching rows?
Aggregation parameters must be preserved!
![Page 13: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/13.jpg)
What do we save? Aggregation parameters
X, Y names or expressions Minimum values: xmin, ymin
Step sizes: binsizex, binsizey
For each aggregated value Bin index Number of points
![Page 14: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/14.jpg)
Conclusions Binning is efficient aggregation technique
Use client-side binning for smaller tables
Preserve aggregation parameters to move between aggregated and full data
Process one row at a time / cache on server
Fix aspect ratio on client
![Page 15: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu](https://reader030.vdocument.in/reader030/viewer/2022032806/56649efd5503460f94c10ea8/html5/thumbnails/15.jpg)
Web-based 2d visualization with large data sets
NASA/IPAC Infrared Science Archive
Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu