1 analysing cosmological simulations in the virtual observatory: designing and mining the millennium...
TRANSCRIPT
![Page 1: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/1.jpg)
1
Analysing Cosmological Simulations in the Virtual Observatory:
Designing and Mining the Millennium Simulation Database
Gerard Lemson German Astrophysical Virtual Observatory (GAVO)ARI, HeidelbergMPE, Garching bei München
![Page 2: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/2.jpg)
2 Garching, June 26, 2008
Acknowledgments
Alex Szalay Virgo consortium, in particular:
Volker Springel, Simon White, Gabriella DeLucia, Jeremy Blaizot(MPA, Munich, Germany),
Carlos Frenk, Richard Bower, John Helly (ICC, Durham, UK) Similar efforts/sites to Millennium Database
Durham (mirror of Millennium DB) Horizon/GalICS (Lyon) ITVO (Trieste)
GAVO is funded by the German Federal Ministry for Education and Research(BMBF)
![Page 3: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/3.jpg)
3 Garching, June 26, 2008
Summary VO aims to provide access to remote data for use/analysis by 3rd
parties. Data analysis requires
advanced methods for analysis data
Data sets are often very large, often far away (makes them even larger!)
To analyse remote datasets, one needs to be able to bring the analysis to the data.
“Standard” approach using flat files and C/IDL/etc code sub-optimal To analyse very large datasets we also need advanced methods of
data organisation and data access Structured approach supported by relational database system allows
one to concentrate on science, iso worry about I/O optimisation etc And the questions can become pretty complex !
![Page 4: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/4.jpg)
4 Garching, June 26, 2008
Case study: The Millennium SimulationSpringel V. et al. 2005 Nature 435, 629
![Page 5: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/5.jpg)
5 Garching, June 26, 2008
Millennium Simulation
Virgo consortium Gadget 3 10 billion particles, dark matter only 500 Mpc periodic box Concordance model (as of 2004) initial conditions 64 snapshots 350000 CPU hours O(30Tb) raw + post-processed data
Post-processing data products complex and large Challenge to analyse, even locally! SimDAP-like approach required for remote access.
![Page 6: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/6.jpg)
6 Garching, June 26, 2008
Intermezzo: Data Access is Hitting a Wall (courtesy Alex Szalay)
FTP and GREP are not adequate You can GREP/FTP 1 MB in a second You can GREP/FTP 1 GB in a minute You can GREP/FTP 1 TB in 2 days You can GREP/FTP 1 PB in 3 years SFTP much slower and 1PB ~2,000 disks
At some point you need indices to limit searchparallel data search and analysis
This is where databases can help
![Page 7: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/7.jpg)
7 Garching, June 26, 2008
Analysis and Databases(courtesy Alex Szalay)
Much statistical analysis deals with Creating uniform samples -- data filtering Assembling relevant subsets Estimating completeness Censoring bad data Counting and building histograms Generating Monte-Carlo subsets Likelihood calculations Hypothesis testing
Traditionally these are performed on files Most of these tasks are much better done inside a
database
![Page 8: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/8.jpg)
8 Garching, June 26, 2008
Advantages of relational databases Encapsulation of data in terms of logical structure, no
need to know about internals of data storage Standard query language for finding information Advanced query optimizers (indexes, clustering) Transparent internal parallelization Authenticated remote access for multiple users at same
time
Forces one to think carefully about data structure Speeds up path from science question to answer Facilitates communication (query code is cleaner) Facilitates adaptation to IVOA standards (ADQL)
![Page 9: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/9.jpg)
9 Garching, June 26, 2008
Millennium Simulation Phenomenology Density field on 2563 mesh
CIC Gaussian smoothed: 1.25,2.5,5,10 Mpc/h
Friends-of-Friends (FOF) groups SUBFIND Subhalos Galaxies from 2 semi-analytical models (SAMs)
MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007) Durham (GalForm, Bower et al, 2006)
Subhalo and galaxy formation histories: merger trees Mock catalogues on light-cone
Pencil beams (Kitzbichler & White, 2006) All-sky (depth of SDSS spectral sample)
(Blaizot et al, 2005)
In preparation: Spectra for light cone galaxies
![Page 10: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/10.jpg)
10 Garching, June 26, 2008
![Page 11: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/11.jpg)
11 Garching, June 26, 2008
Millennium Simulation Phenomenology Density field on 2563 mesh
CIC Gaussian smoothed: 1.25,2.5,5,10 Mpc/h
Friends-of-Friends (FOF) groups SUBFIND Subhalos Galaxies from 2 semi-analytical models (SAMs)
MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007) Durham (GalForm, Bower et al, 2006)
Subhalo and galaxy formation histories: merger trees Mock catalogues on light-cone
Pencil beams (Kitzbichler & White, 2006) All-sky (depth of SDSS spectral sample)
(Blaizot et al, 2005)
In preparation: Spectra for light cone galaxies
![Page 12: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/12.jpg)
12 Garching, June 26, 2008
FOF groups, (sub)halos and galaxies
![Page 13: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/13.jpg)
13 Garching, June 26, 2008
Millennium Simulation Phenomenology Density field on 2563 mesh
CIC Gaussian smoothed: 1.25,2.5,5,10 Mpc/h
Friends-of-Friends (FOF) groups SUBFIND Subhalos Galaxies from 2 semi-analytical models (SAMs)
MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007) Durham (GalForm, Bower et al, 2006)
Subhalo and galaxy formation histories: merger trees Mock catalogues on light-cone
Pencil beams (Kitzbichler & White, 2006) All-sky (depth of SDSS spectral sample)
(Blaizot et al, 2005)
In preparation: Spectra for light cone galaxies
![Page 14: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/14.jpg)
14 Garching, June 26, 2008
Time evolution: merger trees
![Page 15: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/15.jpg)
15 Garching, June 26, 2008
Millennium Simulation Phenomenology Density field on 2563 mesh
CIC Gaussian smoothed: 1.25,2.5,5,10 Mpc/h
Friends-of-Friends (FOF) groups SUBFIND Subhalos Galaxies from 2 semi-analytical models (SAMs)
MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007) Durham (GalForm, Bower et al, 2006)
Subhalo and galaxy formation histories: merger trees Mock catalogues on light-cone
Pencil beams (Kitzbichler & White, 2006) All-sky (depth of SDSS spectral sample)
(Blaizot et al, 2005)
In preparation: Spectra for light cone galaxies
![Page 16: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/16.jpg)
16 Garching, June 26, 2008
Mock catalogues
![Page 17: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/17.jpg)
17 Garching, June 26, 2008
Millennium Simulation Phenomenology Density field on 2563 mesh
CIC Gaussian smoothed: 1.25,2.5,5,10 Mpc/h
Friends-of-Friends (FOF) groups SUBFIND Subhalos Galaxies from 2 semi-analytical models (SAMs)
MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007) Durham (GalForm, Bower et al, 2006)
Subhalo and galaxy formation histories: merger trees Mock catalogues on light-cone
Pencil beams (Kitzbichler & White, 2006) All-sky (depth of SDSS spectral sample)
(Blaizot et al, 2005)
In preparation: Spectra for light cone galaxies
![Page 18: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/18.jpg)
18 Garching, June 26, 2008
Synthetic spectra (not yet available)
![Page 19: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/19.jpg)
19 Garching, June 26, 2008
Hierarchy of Data Products
Density FieldMesh Cell
FOF Group Subhalo
SubhaloMergerTree
SAM Galaxy Merger Tree
Light ConeGalaxy
original
Tree relationships
Parent halo
SUBFIND result
Parent FOF group
Located in
Located in
Spectrum
![Page 20: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/20.jpg)
20 Garching, June 26, 2008
Designing the Database Need a model for data, including relations between
different objects Model needs to support science: “20 questions”
(following Gray & Szalay)1. Return the galaxies residing in halos of mass between 10^13 and 10^14
solar masses. 2. Return the galaxy content at z=3 of the progenitors of a halo identified at
z=0 3. Return the complete halo merger tree for a halo identified at z=0 4. Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5)5. Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10
Msun/yr) at z=3 6. Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e.
SFR>10Msun/yr) at some previous redshift.7. Find the multiplicity function of halos depending on their environment
(overdensity of density field smoothed on certain scale)8. Find the dependency of halo properties on environment
![Page 21: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/21.jpg)
21 Garching, June 26, 2008
Data model features Each object its table
properties are columns each a unique identifier
Relations implemented through foreign keys, pointers to unique identifier column FOF to mesh cell it lies in Sub-halo to its FOF group galaxy to its sub-halo etc
Special design needed for Hierarchical relations: merger trees Spatial relations: multi-dimensional indexes required Support for random sample selection
![Page 22: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/22.jpg)
22 Garching, June 26, 2008
Formation histories:Subhalo and Galaxy merger trees Tree structure
halos have single descendant halos have main progenitor
Hierarchical structures usually handled using recursive code inefficient for data access not (well) supported in RDBs
Tree indexes depth first ordering of nodes defines identifier pointer to last progenitor in subtree
![Page 23: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/23.jpg)
23 Garching, June 26, 2008
![Page 24: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/24.jpg)
24 Garching, June 26, 2008
Merger trees :select prog.* from galaxies des , galaxies prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId
Branching points :select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1
![Page 25: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/25.jpg)
25 Garching, June 26, 2008
Spatial queries, random samples Spatial queries require multi-dimensional
indexes. (x,y,z) does not work: need discretisation
index on (ix,iy,iz) with ix=floor(x/10) etc More sophisticated: space filling curves
bit-interleaving/oct-tree/Z-Index Peano-Hilbert curve Need custom functions for range queries
(Implemented in T-SQL)
Random sampling using a RANDOM column RANDOM from [0,1000000]
![Page 26: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/26.jpg)
26 Garching, June 26, 2008
The Millennium Database web site
SQLServer 2005 database Web application (Java in Apache Tomcat web server)
portal: http://www.mpa-garching.mpg.de/millennium/ public DB access: http://www.g-vo.org/Millennium private access: http://www.g-vo.org/MyMillennium MyDB
Access methods browser with plotting capabilities through VOPlot applet wget + IDL, R TOPCAT (3.1)
![Page 27: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/27.jpg)
27 Garching, June 26, 2008
![Page 28: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/28.jpg)
28 Garching, June 26, 2008
Usage statistics Up since August 2006 (astro-ph/0608019) ~225 registered users > 5 million queries > 40 billion rows ~130 papers, ~50% not related to Virgo consortium
(see http://www.mpa-garching.mpg.de/millennium )
![Page 29: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/29.jpg)
29
Some science questions and their implementation as SQL
If time permits, in any case 1-1 demo possible.
![Page 30: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/30.jpg)
30 Garching, June 26, 2008
Find light cone galaxies in a slice in redshift, RA and Dec
select ra,dec,redshift_obs from kitzbichler2006a_obs where redshift_obs between 1 and 1.1 and dec between -.05 and .0
![Page 31: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/31.jpg)
31 Garching, June 26, 2008
Color-magnitude for random sample of galaxies
select mag_bdust, mag_bdust - mag_vdust as color, type from delucia2006a where snapnum=63 and random between 0 and 100 and mag_b < 0
![Page 32: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/32.jpg)
32 Garching, June 26, 2008
![Page 33: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/33.jpg)
33 Garching, June 26, 2008
Get merger tree for identified galaxy
select p.snapnum, p.x,p.y,p.z, p.stellarmass, p.mag_b-p.mag_v as color from delucia2006a d , delucia2006a p where d.galaxyid=0 and p.galaxyid between d.galaxyid and d.lastprogenitorid
![Page 34: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/34.jpg)
34 Garching, June 26, 2008
![Page 35: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/35.jpg)
35 Garching, June 26, 2008
![Page 36: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/36.jpg)
36 Garching, June 26, 2008
Histogram of density field at redshifts 0,1,2,3; Gaussian smoothing 5 Mpc/h
select snapnum, .01*floor(f.g5/.01) as g5, count(*) as num from mfield f where f.snapnum in (63,41,32,27) group by snapnum,.01*floor(f.g5/.01) order by 1,2
![Page 37: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/37.jpg)
37 Garching, June 26, 2008
![Page 38: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/38.jpg)
38 Garching, June 26, 2008
FOF multiplicity function at redshifts 0,1,2,3,
select snapnum, .1*floor(log10(np)/.1) as lognp, count(*) as num from fof where snapnum in (63,41,32,27) group by snapnum , .1*floor(log10(np)/.1) order by 1,2
![Page 39: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/39.jpg)
39 Garching, June 26, 2008
![Page 40: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/40.jpg)
40 Garching, June 26, 2008
FOF mass multiplicity function, conditioned on density in environmentselect .1*floor(log10(fof.np)/.1)
as lognp, count(*) as num from mfield f , fof where fof.snapnum=f.snapnum and fof.phkey = f.phkey and f.snapnum=63 and f.g5 between 1 and 1.1group by .1*floor(log10(fof.np)/.1)order by 1
![Page 41: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/41.jpg)
41 Garching, June 26, 2008
![Page 42: 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical](https://reader036.vdocument.in/reader036/viewer/2022070414/5697c0201a28abf838cd2414/html5/thumbnails/42.jpg)
42
Thank you !