mining virtual universes simulations in a relational database
DESCRIPTION
Simple observationsTRANSCRIPT
![Page 1: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/1.jpg)
Mining Virtual Universes
Simulations in a relational database
![Page 2: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/2.jpg)
Computer simulations.
Why?
![Page 3: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/3.jpg)
Simple observations
![Page 4: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/4.jpg)
Simple model
![Page 5: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/5.jpg)
Simple, analytical solution
![Page 6: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/6.jpg)
Complex observations
![Page 7: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/7.jpg)
Galaxy merger
John Hibbard http://www.cv.nrao.edu/~jhibbard/n4038/n4038.html NASA/CXC/SAO/G. Fabbiano et al.
![Page 8: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/8.jpg)
X-Ray cluster
8
electron density gas temperature
gas pressure
Courtesy Alexis Finoguenov, Ulrich Briel, Peter Schuecker, (MPE)
![Page 9: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/9.jpg)
Galaxy survey
![Page 10: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/10.jpg)
N-Body simulations
![Page 11: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/11.jpg)
Simple dynamics• Newton’s law of gravity for N particles
= -()
![Page 12: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/12.jpg)
Complex solutions• Only analytical solution for N=2• 3 body not in general• Let alone 10 billion bodies• Need computer simulations
• approximations• scaling like N^2,
![Page 13: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/13.jpg)
![Page 14: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/14.jpg)
14Di Matteo, Springel and Hernquist, 2005
Courtesy Volker Springel
Adding hydrodynamics and gas physics
![Page 15: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/15.jpg)
CMU, CosmoMLStat 15
Millennium-II Simulation
2015-06-03
• 100 Mpc/h
• 1010 particles
• 6.9 106 Msun/h
• ~10 million halos
• ~300GB/snapshot
Boylan-Kolchin etal 2009
![Page 16: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/16.jpg)
CMU, CosmoMLStat 16
Millennium Simulation
2015-06-03
MRII
• 500 Mpc/h
• 1010 particles
• 8.6 108 Msun/h
• ~18 million halos
• ~300GB/snapshot
Springel etal 2005
![Page 17: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/17.jpg)
CMU, CosmoMLStat 17
MR-XXL
2015-06-03
MR
• 3Gpc/h• 3x1011 particles• 750 million
halos/snapshot
• 9TB/snapshot
• browse
![Page 18: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/18.jpg)
FOF groups, (sub)halos and galaxies
![Page 19: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/19.jpg)
![Page 20: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/20.jpg)
CMU, CosmoMLStat 202015-06-03
Raw data:Particles
FOF groups and Subhalos
Density fields
Subhalo merger trees
Synthetic galaxies (SAM)Mock catalogues
![Page 22: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/22.jpg)
millimil@CasJobs
![Page 23: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/23.jpg)
Revisit relational
databases again
http://www.sdss.jhu.edu/~szalay/class/2015/gl/IntroRDB.html indexing: trees and spatial
![Page 24: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/24.jpg)
![Page 25: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/25.jpg)
INDEX-ing• Performance: disk IO is bottleneck• Avoid it as much as possible, but can not store whole DB in
memory• To find rows of interest, avoid having to scan complete tables
• sequential scan ~ O(N)• ~10 min for galaxy tables (109 rows, 250 GB)
• Binary search speed up: requires ordering• ~ O(log(N))• B-Trees
• Can only order in one way: create external data structure, INDEX, ordered according to >=1 columns, with direct pointer to row.
![Page 26: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/26.jpg)
snapnum, stellarMass, galaxyid
Indexes
mag_b snapnum, x
![Page 27: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/27.jpg)
B-tree
![Page 28: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/28.jpg)
Special indexes• trees• spatial
![Page 29: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/29.jpg)
Time evolution: merger trees
![Page 30: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/30.jpg)
![Page 31: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/31.jpg)
Formation histories:Subhalo and Galaxy merger trees
• Tree structure• halos have single descendant• halos have main progenitor
• Hierarchical structures usually handled using recursive code• inefficient for data access• not (well) supported in RDBs
• Tree indexes• depth first ordering of trees• label by rank in order• pointer to “last progenitor” below each node• all progenitors have label BETWEEN label of root AND that of last progenitor• cluster table on label
![Page 32: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/32.jpg)
![Page 33: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/33.jpg)
select prog.snapnum, prog.x, prog.y, prog.np from millimil..mpahalo des , millimil..mpahalo prog where prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0
![Page 34: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/34.jpg)
![Page 35: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/35.jpg)
select prog.snapnum, prog.x, prog.y, prog.mag_b-prog.mag_v as color from millimil..delucia2006a des , millimil..delucia2006a prog where prog.galaxyId between des.galaxyId and des.lastProgenitorId and des.galaxyId = 0
(See topcat)
Galaxies
![Page 36: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/36.jpg)
Millennium DB Tutorial2007-01-17/19 Leiden
Some more features of the merger tree data model
Leaves :
select galaxyId as leaf from galaxies des where galaxyId
= lastProgenitorId
Branching points :
select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1
![Page 37: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/37.jpg)
Millennium DB Tutorial2007-01-17/19 Leiden
Main branches • Roots and leaves:
select des.galaxyId as rootId, min(prog.lastprogenitorid) as leafId into rootLeaf from mpagalaxies..delucia2006a des , mpagalaxies.. delucia2006a prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId
• Main branchselect rl.rootId, b.* from rootLeaf rl , mpagalaxies..delucia2006a b where b.galaxyId between rl.rootId and rl.leafId
![Page 38: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/38.jpg)
38
Query particles in volume
![Page 39: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/39.jpg)
Find all halos in a subvolume of space:
10 <= x < 2020 <= y < 300 <= z < 10
![Page 40: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/40.jpg)
Inefficient, even when indexed
select x,y,z from mpahalotrees..mhalo where snapnum = 63 and x between 10 and 20 and y between 20 and 30 and z between 0 and 10
![Page 41: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/41.jpg)
Why inefficient x y z 15.001083 42.471325 24.673561
15.001247 58.420914 42.722874
15.002215 38.042484 29.557423
15.002735 50.487785 57.716877
15.002753 20.000177 8.21466
15.005095 13.637599 16.135191
15.006593 22.170828 48.242783
15.011488 24.824438 19.773285
15.011741 48.099907 11.500685
15.011868 23.312265 27.858799
15.013065 23.969515 18.883507
15.013158 56.041866 40.82894
15.014361 59.503357 45.31733
15.017322 46.257664 44.37695
15.018202 27.333895 9.441319
![Page 42: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/42.jpg)
Spatial indexes• Performance of finding things is improved if
those things are co-located on disk: ordering, indices
• Co-locating a 3D configuration of points on a 1D disk can only be done approximately
• Space filling curves: Peano-Hilbert • requires user defined functions
to use
• Simpler: Zones
![Page 43: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/43.jpg)
43
Query particles in volume
![Page 44: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/44.jpg)
44
Index cells using space filling curve
![Page 45: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/45.jpg)
CMU, CosmoMLStat 45
Query particles in sphere/box• Calculate overlap space filling curve with query volume
• Decide (from index table) which files to query
• And where to seek, how far to scan
• Implement as SQLCLR table-valued-function• Run from database
2015-06-03
![Page 46: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/46.jpg)
Simpler: Zones
![Page 47: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/47.jpg)
Zone index• Coarse sampling of points in multiple dimensions
allows simple multi-dimensional ordering• ix = floor(x/10Mpc)
iy = floor(y/10Mpc)iz = floor(z/10Mpc)
• index on (snapnum,ix,iy,iz,x,y,z,galaxyId)
![Page 48: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/48.jpg)
IX IY IZ X Y Z
1 2 0 15.061804 20.891907 4.4156647
1 2 0 15.069336 23.437601 9.812217
1 2 0 15.100678 20.905642 4.613036
1 2 0 15.173968 22.36883 8.01832
1 2 0 15.194122 20.67583 4.8034463
1 2 0 15.2500305 24.246683 1.6651521
1 2 0 15.365576 23.290754 9.404872
1 2 0 15.372606 20.203691 2.0006201
1 2 0 15.524696 21.03997 4.280077
1 2 0 15.583943 22.344622 9.421347
1 2 0 15.6358385 26.785904 9.881406
1 2 0 15.66383 22.829983 7.137772
1 2 0 15.673803 26.918291 3.302736
1 2 0 15.717824 22.365341 9.221828
1 2 0 15.847992 24.700747 1.389664
1 2 0 15.883896 22.593819 7.277129
1 2 0 15.91041 26.531118 2.5693457
1 2 0 15.916905 27.137867 4.289855
1 2 0 16.047333 28.93811 5.414605
![Page 49: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/49.jpg)
Using zonesselect x,y,z from mpahalo where snapnum = 63 and ix = 1 and iy = 2 and iz = 0
NB does NOT include galaxies with x=20 exactly!
![Page 50: Mining Virtual Universes Simulations in a relational database](https://reader036.vdocument.in/reader036/viewer/2022062905/5a4d1ad97f8b9ab059974154/html5/thumbnails/50.jpg)
“20 questions”1. Return the (B-band luminosity function of) galaxies residing in halos of mass
between 10^13 and 10^14 solar masses. 2. Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3. Return all the galaxies within a sphere of radius 3Mpc around a particular halo 4. Return the complete halo merger tree for a halo identified at z=0 5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour
and bulge-to-disk ratio within given intervals. 6. Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a
major merger (mass-ratio < 4:1) since redshift 1.5. 7. Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 8. Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 9. Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a
progenitor of BCG's in z=0 cluster of mass >10**14.5 10.Find all z=3 galaxies which have NO z=0 descendant. 11.Return the complete galaxy merging history for a given z=0 galaxy. 12.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at
some previous redshift. 13.Find the multiplicity function of halos depending on their environment (over density
of density field smoothed on certain scale)14.Find the dependency of halo formation times on environment