![Page 1: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/1.jpg)
Markus Nullmeier
Zentrum für Astronomie der Universität HeidelbergAstronomisches Rechen-Institut
[email protected] https://github.com/mnullmei
Latest pgSphere developments
![Page 2: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/2.jpg)
pgSphere: infrastucture for the VO
● Many VO data centres use the PostgreSQL RDBMS
– VizieR, Simbad at CDS– ESAC at ESO– CADC– GAVO at ARI– several others, plus centres that are migrating ...
● pgSphere useful for
– Custom PostgreSQL spherical data types– Indexing (fast queries)
●
![Page 3: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/3.jpg)
pgSphere update: overview
● New features and WIP
– MOCs– OUZO: indexing for MOCs– BRIN indexing– Efficient crossmatch– PgSphere packages for Linux distributions– Official release
● Future projects for pgSphere
– Integration with the JIT acceleration of PostgreSQL 11– Faster indexing in 2D (now: 3D)– Optimal read-only indexing (maybe GSOC)
![Page 4: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/4.jpg)
MOCs: since 2017 / & WIP
To describe arbitrary sky regions such as those of gravitational wave events, we need something else:
MOC: Multi-Order Coverage
= set of Healpix sphere elements of different orders
![Page 5: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/5.jpg)
MOC internalsMOC: Multi-Order Coverage
= set of Healpix sphere elements (diamond-shaped) of different orders
1 diamond element = 1 integer interval
1 MOC object = 1 list of intervals
{[2, 6) [17, 30) [33, 40) [123, 124) [332, 438), ...}
![Page 6: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/6.jpg)
● “inverted index” for the constituent intervals ● Thanks to https://github.com/postgrespro/rum
sorted intervals sets of pointers to MOCs
[17, 30) { moc7, moc11 }
[843, 2577) { moc2, moc108, moc109 }
[5756, 9433) { moc108, moc, moc1103 }
... ...
OUZO: indexing for MOCs
![Page 7: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/7.jpg)
BRIN indexing
● Small indexes for big tables● Originally a PostGIS project● Thanks to Guiseppe Broccolo
![Page 8: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/8.jpg)
Efficient crossmatch
● Problems with crossmatch (spatial join)
– Everybody translates ADQL to SQL– Then, only the index of one table is used– … more often than not, the wrong index
● Solutions
– Use both indexes at the same time(WIP, Alexander Korotkov)
– Custom spatial joins for pgSphere(Dmitry Ivanov, crossmatch-cnode branch)
![Page 9: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/9.jpg)
pgSphere packages for Linux distributions
● Why?
– Saves DBA precious time– Fewer bugs for users– Free quality assurance!– Prerequisite for data centre tools for everybody...
● Which ones?
– Debian Ubuntu, PGDG: thanks to Ole Streicher, →Christoph Berg
– Fedora CentOS, etc., PGDG: thanks to →Christian Dersch
![Page 10: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/10.jpg)
WIP: official pgSphere release
● Thorny problem: recover from old bugs in existing installations
– Old PostgreSQL syntax (thanks: Pat Dowler, Alexander Korotkov)
– Incomplete system tables (thanks: Markus Demleitner)– (inevitable) proliferation and use of development code
● HEALPix problems
– Rather unportable official library– Official library has got wrong licence (GPL, not LGPL)– Changing to BSD-licenced library to have MOC here
![Page 11: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/11.jpg)
Integration with the JIT acceleration of PostgreSQL 11
● Moore’s Law is dead, everybody does JIT nowadays
– Finally, also in PostgreSQL– pgSphere needs to adapt– Funding??
● More options for pgSphere (read: ADQL) speedups
– Parallel queries– Database clustering (PostgreSQL WIP)
![Page 12: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/12.jpg)
Faster indexing in 2D (now: 3D)
● Announced at ADASS 2106
– Simple idea, but:– Devil in the details– Potential synergy with GIS community
![Page 13: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/13.jpg)
Optimal read-only indexing
● VO use case is mosty read-only
– Potential of huge ADQL speedups– Hope for GSOC student, thanky to Andrey Borodin
![Page 14: Latest pgSphere developments - Asterics 2020 · – Optimal read-only indexing (maybe GSOC) MOCs: since 2017 / & WIP To describe arbitrary sky regions such as those of gravitational](https://reader030.vdocument.in/reader030/viewer/2022041102/5edd6bd9ad6a402d6668824d/html5/thumbnails/14.jpg)
pgSphere update: overview
● New features and WIP
– MOCs– OUZO: indexing for MOCs– BRIN indexing– Efficient crossmatch– PgSphere packages for Linux distributions– Official release
● Future projects for pgSphere
– Integration with the JIT acceleration of PostgreSQL 11– Faster indexing in 2D (now: 3D)– Optimal read-only indexing (maybe GSOC)