a in java, rust and webassembly - university of...

1
The CDS HEALPix library In Java, Rust and WebAssembly F.-X. Pineau & the CDS team [email protected] Abstract The CDS is releasing a new HEALPix library implemented in Java. Before possibly porting it in C, we are experimenting with the Rust programming lan- guage. It allows the library to be compiled into WebAssembly or native code, and thus to easily plug it into web browsers, Python codes or PostgreSQL, to name but a few. The library is distributed under the 3-clause BSD licence. It focuses on our own needs, on performances and accuracy. We are investigating the potential usage of the WebAssembly version into Aladin Lite. The objective is twofold: supporting deeper orders (up to 24, the present limit being 13), and changing the current GPL license to a less restric- tive one. Aladin desktop has already started to resort to the Java version. Unlike the “official” library, coming from the cosmology community, our Java version do not currently supports Spherical Fourier Transformations and do not support the RING scheme (also it is able to transform cells numbers from the nested scheme to the ring scheme and vice-versa). In return, it do support additional features like: distinguishing between cells partially and fully overlapped by a cone; exact cells-overlapped-by-cone solution; very-fast approximate cells-overlapped-by-cone function dedicated to cross-matches; supporting self-intersecting polygons of any size; fast ordered list of small cells surrounding a larger cell; MOC support in cells-overlapped-by-cone queries; BMOC support (MOC with an additional flag for each cell); etc. Motivations The motivations bringing the CDS to develop an HEALPix library from scratch are various. Licence control Switch from GPL (“official library”) to 3-clause BSD (“CDS li- brary”) to change the Aladin Lite licence and be compatible with, for example, the Astropy licence. Internal expertise Develop an internal expertise in a key component of both CDS ser- vices and the HiPS IVOA standard. Aladin Lite support Bring the deepest addressable resolution from order 13 to order 24 and add support for polygons. Easier evolutions Make evolutions fitting with the CDS needs easier (mainly in Al- adin, Aladin Lite and the X-match service). Languages Java Since it is widely used at CDS (Aladin, SIMBAD, Cross-match ser- vice, etc), the HEALPix library has been developed first in Java. All features mentioned in this poster were made available in the Java version. Rust Rust is a recent open-source language sponsored by Mozilla and pursuing the trifecta: safety, concurrency, and speed (Rust weekly newsletter). It aim at offering high-level ergonomics and low-level control (online Rust book). Since it is a compiled language, we use Rust to generate both WebAssembly files and static or dy- namic libraries that can be called from Python or PostgreSQL. So far, mainly basic HEALPix features meeting with the Aladin Lite needs have been implemented in Rust (cell number from coordi- nates, cell center, cell vertices, cell neighbours, approximate cells- in-cone, projection/deprojection). WebAssembly WebAssembly is a bytecode standardized by the W3C and com- patible with all recent Web browsers. It aims at complementing Javascript, with better performances, and can be generated from compiled languages like C, C++ or Rust. C? Rust pre-compiled binaries are similar to C one’s and could be dis- tributed in softwares like astropy. However, instaling Rust tools is necessary if a user want to manually compile a module from the source code. It is thus not straitforward to integrate Rust code in large projects like astropy or PostgreSQL modules. It may bring us to write a C version of the library. Features Comparing to the “standard library”, so far the CDS version do not supports : the RING scheme , but the library offers functions converting a NESTED cell number into a RING cell number (and vice-versa). For indexation purposes, the NESTED scheme (based on Z-order curves) has better locality properties than the RING scheme. spherical harmonics computations and Fast Fourier Transforms which are extensively used in the cosmology community. supports : Projection/deprojection : compute Euclidean (X, Y ) coordinates from Spherical (α, δ ) coordinates, and vice-versa. The library contains an internal projection (see Fig. 1) and a version compat- ible with the WCS HPX projection. The shifted-rotated internal projection is used to compute the cells values (each of the 12 base cells is sub-divided and indexed following a z-order curve) 0 1 2 3 4 5 6 7 4 8 9 10 11 x y i =0 i =1 i =2 i =3 i =4 j =0 j =1 j =2 j =3 j =4 0 1 2 3 4 5 6 7 4 8 9 10 11 i j Figure 1: Left: HEALPix internal projection plane, showing the 12 base cells. Right: shifted-rotated projection plane used to compute cells number. Cell partially/fully-overlapped-by-cone : this binary coverage in- formation allows to avoid useless time-consuming distances com- putations when, e.g., performing a cone-search query on a table. Exact cells-overlapped-by-cone solution : no false positive cells in cone-search queries to avoid (again) useless distances compu- tation, and possible disks accesses. Largest center-to-vertex distance upper limit : depending on both the order and the position of the cell on the sky. Used in fast but approximative cells-overlapped-by-cone queries. Figure 2: Computed upper limit (in blue) as a function of (α, δ ) for the or- der 8. Red: exact value computed for every order 8 cells. Plot made using TOPCAT. Very-fast approximate cells-overlapped-by-cone function : ded- icated to map/reduce based cross-matches, in which the number of sources in each cone is very small. Fast ordered list of small cells surrounding a larger cell : used for example to retrieve the list of sources in a large cell taking into account border effects. The ordering is important to maxi- mize the sequential access to data stored on spinning HDDs. Self-intersecting polygons of any size : (see Fig. 3), also provides the list of cells fully and partially overlapped by the polygon. Like in the “official” library, we so far use an approximation: we do as if a cell border between two vertices was on a great-circle. An exact solution is possible but would be computationally less efficient. Figure 3: Example of MOC generated by a large scale self-intersecting poly- gon in Aladin Read-only BMOCs : a BMOC is an extension of a MOC but stor- ing for each cell an additional status flags telling if the cell is partially or fully covered by the area the MOC represents. Extra axis-support : as an experiment, we added the possibility to compute a cell number based on a spherical positions plus an extra axis. One can imagine an additional z -axis to Fig. 1: the 12 base cells become 12 base cubes, each hierarchically divided and indexed according to a 3D z-order curve. We still have to implement queries returning 3D BMOCs. Technical details Projection: simplified equations HEALPix is quite extensively described in Calabretta (2004), G´ orski et al. (2005), Calabretta & Roukema (2007) and Reinecke & Hivon (2015). It is first of all an equal-area projection composed from two other projections. Internally we have chosen a projection scale such that all coordinates in the projection plane are [0, 8[ on the X -axis and [-2, 2] on the Y -axis. The internal simplified equations are: Collignon (pseudo-cylindrical equal-area) projection in the polar caps, for α [0,π/2] and sin δ> 3/2: t = p 3(1 - sin δ ) X =(α 4 π - 1)t +1 Y =2 - t α [0, π 2 ] sin δ ] 2 3 , 1] t [0, 1[ X ]0, 2[ Y ]1, 2] Cylindrical equal-area projection in the equatorial region: X = α × 4 π Y = sin(δ ) × 3 2 α [0, 2π ] X [0, 8] sin δ [- 2 3 , 2 3 ] Y [-1, 1] +2 +1 -1 -2 0 1 2 3 4 5 6 7 4 8 9 10 11 x y +1 0 -1 0 1 2 3 4 5 6 7 4 8 9 10 11 x y Figure 4: Left: Collignon projection. Right: CEA projection. Precision at poles The formula t = p 3(1 - sin δ ) causes non-negligible numerical inac- curacies near the poles due to the 1 - sin δ expression it contains: arcsin(1 - 1.0 × 10 -15 ) 89.99999919 deg; π 2 - arcsin(1 - 1.0 × 10 -15 ) 2.917 mas. We thus replaced the previous equation by the equivalent but numeri- cally stable form: t = 6 cos( δ 2 + π 4 ) . Demo: 1 - sin δ = 1 + cos(δ + π 2 )=2 1+cos ( 2( δ 2 + π 4 ) ) 2 = 2 cos 2 ( δ 2 + π 4 ). This form is also computationally less expensive since we spare a time-consuming square-root operation (the square-root applying here on a constant instead of a variable). The exact cells-in-cone solution 0 4 5 . A . B Figure 5: Tissot’s indicatrices. Let’s note Ω c , Ω p the sur- face covered by the cone and the cell (or pixel) re- spectively. The basic cell selection algorithm is: 4 vertices in the cone Ω c Ω p p 1-3 vertices in the cone Ω c Ω p 6= cell contains a special point Ω c Ω p 6= There is 4 + 1 special points: 4 points such that the slope of the tangent line to the projected cone on that point = ±1 for large cells the center of the cone is also a special point Computing the “special points” coordinates We use the Haversine formula to get an accurate cone expression at small radii: Δα = 2 arcsin s sin 2 θ 2 - sin 2 δ -δ 0 2 cos δ 0 cos δ In the equatorial region, the equation of tangent lines is: X dY = X α α dδ dδ dz dz dY = ±1 The projection formulae: z = sin δ , X =4/πα, Y =3/2z lead to X α =4/π, dδ dz = 1 cos δ , dz dY =2/3 and, finally, we find the special points latitudes by solving numerically (Newton’s method): 1 cos δ α(δ ) dδ 3π 8 =0 . In the polar caps, the equation of tangent lines is: X dY = X dδ dδ dz dz dY = ±1 with z = sin δ , t = p 3(1 - z )= 6 cos( δ 2 + π 4 ), X =( 4 π α - 1)t +1, Y =2 - t, leading to dδ dz = 1 cos δ , dz dY = 2 3 t and, finally, we find the special points latitudes by solving numerically (Newton’s method): t(δ ) cos δ d dδ ( 4 π α(δ ) - 1)t(δ ) 3 2 =0 . References Calabretta, M. R. 2004, ArXiv Astrophysics e-prints. astro-ph/0412607 Calabretta, M. R., & Roukema, B. F. 2007, MNRAS, 381, 865 orski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen, F. K., Reinecke, M., & Bartelmann, M. 2005, ApJ, 622, 759. astro-ph/0409513 Reinecke, M., & Hivon, E. 2015, A&A, 580, A132. 1505.04632 Franc ¸ois-Xavier Pineau ([email protected]) Universit´ e de Strasbourg, CNRS, Observatoire astronomique de Strasbourg, UMR 7550, F–67000 Strasbourg, France ADASS XXVIII, Astronomical Data Analysis Software & Systems 11 – 15 Nobvember 2018, College Park, Maryland, USA

Upload: others

Post on 26-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A In Java, Rust and WebAssembly - University Of Marylandadass2018.astro.umd.edu/abstracts/P13-13.pdfWebAssembly WebAssembly is abytecodestandardized by theW3Candcom-patible with all

The CDS HEALPix libraryIn Java, Rust and WebAssembly

F.-X. Pineau & the CDS [email protected]

AbstractThe CDS is releasing a new HEALPix library implemented in Java. Before

possibly porting it in C, we are experimenting with the Rust programming lan-guage. It allows the library to be compiled into WebAssembly or native code,and thus to easily plug it into web browsers, Python codes or PostgreSQL, toname but a few. The library is distributed under the 3-clause BSD licence. Itfocuses on our own needs, on performances and accuracy.

We are investigating the potential usage of the WebAssembly version intoAladin Lite. The objective is twofold: supporting deeper orders (up to 24, thepresent limit being 13), and changing the current GPL license to a less restric-tive one. Aladin desktop has already started to resort to the Java version.

Unlike the “official” library, coming from the cosmology community, ourJava version do not currently supports Spherical Fourier Transformations anddo not support the RING scheme (also it is able to transform cells numbersfrom the nested scheme to the ring scheme and vice-versa). In return, it dosupport additional features like: distinguishing between cells partially andfully overlapped by a cone; exact cells-overlapped-by-cone solution; very-fastapproximate cells-overlapped-by-cone function dedicated to cross-matches;supporting self-intersecting polygons of any size; fast ordered list of smallcells surrounding a larger cell; MOC support in cells-overlapped-by-conequeries; BMOC support (MOC with an additional flag for each cell); etc.

MotivationsThe motivations bringing the CDS to develop an HEALPix libraryfrom scratch are various.

Licence controlSwitch from GPL (“official library”) to 3-clause BSD (“CDS li-brary”) to change the Aladin Lite licence and be compatible with,for example, the Astropy licence.

Internal expertiseDevelop an internal expertise in a key component of both CDS ser-vices and the HiPS IVOA standard.

Aladin Lite supportBring the deepest addressable resolution from order 13 to order 24and add support for polygons.

Easier evolutionsMake evolutions fitting with the CDS needs easier (mainly in Al-adin, Aladin Lite and the X-match service).

LanguagesJava

Since it is widely used at CDS (Aladin, SIMBAD, Cross-match ser-vice, etc), the HEALPix library has been developed first in Java. Allfeatures mentioned in this poster were made available in the Javaversion.

RustRust is a recent open-source language sponsored by Mozilla andpursuing the trifecta: safety, concurrency, and speed (Rust weeklynewsletter). It aim at offering high-level ergonomics and low-levelcontrol (online Rust book). Since it is a compiled language, weuse Rust to generate both WebAssembly files and static or dy-namic libraries that can be called from Python or PostgreSQL. Sofar, mainly basic HEALPix features meeting with the Aladin Liteneeds have been implemented in Rust (cell number from coordi-nates, cell center, cell vertices, cell neighbours, approximate cells-in-cone, projection/deprojection).

WebAssemblyWebAssembly is a bytecode standardized by the W3C and com-patible with all recent Web browsers. It aims at complementingJavascript, with better performances, and can be generated fromcompiled languages like C, C++ or Rust.

C ?Rust pre-compiled binaries are similar to C one’s and could be dis-tributed in softwares like astropy. However, instaling Rust tools isnecessary if a user want to manually compile a module from thesource code. It is thus not straitforward to integrate Rust code inlarge projects like astropy or PostgreSQL modules. It may bring usto write a C version of the library.

FeaturesComparing to the “standard library”, so far the CDS version

do not supports :

the RING scheme , but the library offers functions converting aNESTED cell number into a RING cell number (and vice-versa).For indexation purposes, the NESTED scheme (based on Z-ordercurves) has better locality properties than the RING scheme.

spherical harmonics computations and Fast Fourier Transformswhich are extensively used in the cosmology community.

supports :

Projection/deprojection : compute Euclidean (X, Y ) coordinatesfrom Spherical (α, δ) coordinates, and vice-versa. The librarycontains an internal projection (see Fig. 1) and a version compat-ible with the WCS HPX projection. The shifted-rotated internal

projection is used to compute the cells values (each of the 12 basecells is sub-divided and indexed following a z-order curve)

0 1 2 3

4 5 6 7 4

8 9 10 11

x

y

bic = 0 bic = 1 bic = 2 bic = 3 bic = 4

bjc = 0

bjc = 1

bjc = 2

bjc = 3

bjc = 4 0

1

2

3

4

5

6

7

4

8

9

10

11

i

j

Figure 1: Left: HEALPix internal projection plane, showing the 12 base cells.Right: shifted-rotated projection plane used to compute cells number.

Cell partially/fully-overlapped-by-cone : this binary coverage in-formation allows to avoid useless time-consuming distances com-putations when, e.g., performing a cone-search query on a table.

Exact cells-overlapped-by-cone solution : no false positive cellsin cone-search queries to avoid (again) useless distances compu-tation, and possible disks accesses.

Largest center-to-vertex distance upper limit : depending onboth the order and the position of the cell on the sky. Used infast but approximative cells-overlapped-by-cone queries.

Figure 2: Computed upper limit (in blue) as a function of (α, δ) for the or-der 8. Red: exact value computed for every order 8 cells. Plot made usingTOPCAT.

Very-fast approximate cells-overlapped-by-cone function : ded-icated to map/reduce based cross-matches, in which the numberof sources in each cone is very small.

Fast ordered list of small cells surrounding a larger cell : usedfor example to retrieve the list of sources in a large cell takinginto account border effects. The ordering is important to maxi-mize the sequential access to data stored on spinning HDDs.

Self-intersecting polygons of any size : (see Fig. 3), also providesthe list of cells fully and partially overlapped by the polygon.Like in the “official” library, we so far use an approximation: wedo as if a cell border between two vertices was on a great-circle.An exact solution is possible but would be computationally lessefficient.

Figure 3: Example of MOC generated by a large scale self-intersecting poly-gon in Aladin

Read-only BMOCs : a BMOC is an extension of a MOC but stor-ing for each cell an additional status flags telling if the cell ispartially or fully covered by the area the MOC represents.

Extra axis-support : as an experiment, we added the possibilityto compute a cell number based on a spherical positions plus anextra axis. One can imagine an additional z-axis to Fig. 1: the12 base cells become 12 base cubes, each hierarchically dividedand indexed according to a 3D z-order curve. We still have toimplement queries returning 3D BMOCs.

Technical details

Projection: simplified equationsHEALPix is quite extensively described in Calabretta (2004), Gorskiet al. (2005), Calabretta & Roukema (2007) and Reinecke & Hivon(2015). It is first of all an equal-area projection composed from twoother projections. Internally we have chosen a projection scale suchthat all coordinates in the projection plane are ∈ [0, 8[ on the X-axisand ∈ [−2, 2] on the Y -axis. The internal simplified equations are:Collignon (pseudo-cylindrical equal-area) projection in the polar

caps, for α ∈ [0, π/2] and sin δ > 3/2: t =√

3(1− sin δ)

X = (α4π − 1)t + 1

Y = 2− t⇒α ∈ [0, π2 ]

sin δ ∈]23, 1]

t ∈ [0, 1[X ∈]0, 2[Y ∈]1, 2]

Cylindrical equal-area projection in the equatorial region:X = α× 4

πY = sin(δ)× 3

2⇒α ∈ [0, 2π] X ∈ [0, 8]

sin δ ∈ [−23,

23] Y ∈ [−1, 1]

+2

+1

−1

−2

0 1 2 3

4 5 6 7 4

8 9 10 11

x

y

+1

0

−1

0 1 2 3

4 5 6 7 4

8 9 10 11

x

y

Figure 4: Left: Collignon projection. Right: CEA projection.

Precision at polesThe formula t =

√3(1− sin δ) causes non-negligible numerical inac-

curacies near the poles due to the 1− sin δ expression it contains:

• arcsin(1− 1.0× 10−15) ≈ 89.99999919 deg;

• π2 − arcsin(1− 1.0× 10−15) ≈ 2.917 mas.

We thus replaced the previous equation by the equivalent but numeri-

cally stable form: t =√

6 cos(δ

2+π

4) .

Demo: 1− sin δ = 1 + cos(δ + π2) = 2

1+cos(2(δ2+π4))

2 = 2 cos2(δ2 + π4).

This form is also computationally less expensive since we spare atime-consuming square-root operation (the square-root applying hereon a constant instead of a variable).

The exact cells-in-cone solution

0 1

4 5 6

8 9

x

y

.A

.B

Figure 5: Tissot’s indicatrices.

Let’s note Ωc, Ωp the sur-face covered by the coneand the cell (or pixel) re-spectively. The basic cellselection algorithm is:

• 4 vertices in the cone⇒ Ωc ∩ Ωp = Ωp

• 1-3 vertices in thecone⇒ Ωc ∩ Ωp 6= ∅• cell contains a special

point⇒ Ωc ∩ Ωp 6= ∅There is 4 + 1 specialpoints:

• 4 points such that the slope of the tangent line to the projected coneon that point = ±1

• for large cells the center of the cone is also a special point

Computing the “special points” coordinates

We use the Haversine formula to get an accurate cone expression atsmall radii:

∆α = 2 arcsin

√sin2 θ2 − sin2 δ−δ0

2

cos δ0 cos δ

In the equatorial region, the equation of tangent lines is:

d∆X

dY=

d∆X

d∆α

d∆α

dz

dz

dY= ±1

The projection formulae: z = sin δ, X = 4/πα, Y = 3/2z lead tod∆Xd∆α = 4/π, dδ

dz = 1cos δ,

dzdY = 2/3 and, finally, we find the special

points latitudes by solving numerically (Newton’s method):

1

cos δ

d∆α(δ)

dδ∓ 3π

8= 0 .

In the polar caps, the equation of tangent lines is:

d∆X

dY=

d∆X

dz

dz

dY= ±1

with z = sin δ, t =√

3(1− z) =√

6 cos(δ2 + π4), X = (4

πα − 1)t + 1,Y = 2 − t, leading to dδ

dz = 1cos δ ,

dzdY = 2

3t and, finally, we find thespecial points latitudes by solving numerically (Newton’s method):

t(δ)

cos δ

d

[(4

πα(δ)− 1)t(δ)

]∓ 3

2= 0 .

ReferencesCalabretta, M. R. 2004, ArXiv Astrophysics e-prints. astro-ph/0412607

Calabretta, M. R., & Roukema, B. F. 2007, MNRAS, 381, 865

Gorski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen, F. K., Reinecke,M., & Bartelmann, M. 2005, ApJ, 622, 759. astro-ph/0409513

Reinecke, M., & Hivon, E. 2015, A&A, 580, A132. 1505.04632

Francois-Xavier Pineau ([email protected])Universite de Strasbourg, CNRS, Observatoire astronomique de Strasbourg, UMR 7550, F–67000 Strasbourg, France

ADASS XXVIII, Astronomical Data Analysis Software & Systems11 – 15 Nobvember 2018, College Park, Maryland, USA