metalibm - inria

Metalibm

Olga Kupriianova

Sorbonne Universites, UPMC Univ Paris 06,UMR 7606, LIP6,

F-75005 Paris, France

Rencontres Arithmetiques de l’Informatique Mathematique7 April 2015

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 1 / 25

Outline

1 Introducing libm

2 Problem statement

3 The Metalibm Project

4 General approach to function implementation

Mathematical libraries

The libm

gives a set of maths functions (exp, log, sin, cos, pow, ...)

supports important precisions (float, double, long double)

The existing implementations

glibc libm

libmcr by Suna

crlibm by ENS-Lyon

libultim by IBM

Intel’s proprietary mathimf

aAll other trademarks and copyrights are the property of their respective owners

The glibc libm

Why glibc libm?

Math library running on all linux-powered machines

Written by different developer teams

Some codes were written 20 years ago

Strange naming conventions

The glibc libm

Why glibc libm?

The glibc libm

Why glibc libm?

/∗∗ IBM Accurate Mathemat ica l L i b r a r y∗ w r i t t e n by I n t e r n a t i o n a l Bu s i n e s s Machines Corp .∗ Copy r i gh t (C) 2001-2013 Free So f tware Foundat ion , I n c . ∗//∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗//∗ ∗//∗ MODULE NAME: halfulp.c ∗//∗ ∗//∗ FUNCTIONS : halfulp ∗//∗ FILES NEEDED: mydefs . h d l a . h end ian . h ∗//∗ u roo t . c ∗//∗ ∗//∗Rout ine h a l f u l p ( doub l e x , doub l e y ) computes xy where ∗//∗ r e s u l t does not need round ing . I f the r e s u l t i s c l o s e r ∗//∗ to 0 than can be r e p r e s e n t e d i t returns 0. ∗//∗ I n the f o l l o w i n g c a s e s the f u n c t i o n does not compute ∗//∗ any th i ng and r e t u r n s a negative number: ∗//∗ 1 . i f the r e s u l t needs round ing , ∗//∗ 2 . i f y i s o u t s i d e the i n t e r v a l [ 0 , 2ˆ20−1] , ∗//∗ 3 . i f x can be r e p r e s e n t e d by x=2∗∗n f o r some i n t e g e r n ∗//∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/

The libm

The restrictions of libm

Limited set of functions: trigonometric, exponentiation andlogarithmic, hyperbolic, special functions

Limited set of precisions (float, double, long double)

The input range: “as far as the input and output formats allow”

The only one implementation, hard-coded long time ago

No special code for exp(x) on [−10; 10] with 42 bits of accuracy

Speed vs. accuracy compromise

Common wisdom

The more accurate you compute, the more expensive it gets

Observation

The binary64 format provides about 16 decimal digits,while physical measurements might use only 3!

Compiler flag?

Why can’t we have an option like

gcc -c code.c -fp-transcendental=15ulps

to get less accuracy when we don’t need more

Common wisdom

Observation

Compiler flag?

Common wisdom

Observation

Compiler flag?

Outline

1 Introducing libm

2 Problem statement

Flavors of functions

Flavor

A flavor is a particular specification of a function

Absolute error in ulps

|X − x | ≤ α · ulp(x)

Relative Error ∣∣∣∣Xx − 1

∣∣∣∣ ≤ α · β−p+1

Correct Rounding

|X − x | ≤ 0.5 · ulp(x)

Table Maker’s Dilemma

Sometimes huge accuracy is neededRuntime of function is not bounded or really huge

Recommendations

Precompute TMD worst cases

Avoid correct rounding wherever we do not really need it

Table Maker’s Dilemma

Sometimes huge accuracy is neededRuntime of function is not bounded or really huge

Recommendations

Precompute TMD worst cases

Avoid correct rounding wherever we do not really need it

Different flavors of the functions

Precision Ulp Error DecimalDigits

NeededAccuracy

Verdict

Single 1 ulp 7 2−24 fastSingle (CR) 0.5 ulps 7 2−50 slowDouble 1 ulp 16 2−53.5 fastDouble (CR) 0.5 ulps 16 2−150 slow

Not fast enough? Still more flavors

Single 25 ulps 5 2−17 fasterSingle 211 ulps 3 2−11 the fastestDouble 211 ulps 12 2−40 fasterDouble 242 ulps 3 2−10 the fastest

NeededAccuracy

Verdict

NeededAccuracy

Verdict

Still more flavors

We should get even more flavors!

set floating-point flags or not 2 possibilities

support all rounding modes or not: 4 possibilities

save mode, perform computations, restore modeperform computations to support all the rounding modesperform integer-based computations

⇒ Eight more flavors

Rough Computations

3 precisions

∼ 50 functions in libm

∼ 10 flavors in terms of accuracy

8 additional flavors

I have to implement 12000 functions...

Each function implementation takes ∼ 1 man-monthIt will take me 1000 years to rewrite the libm

Rough Computations

3 precisions

Rough Computations

3 precisions

Each function implementation takes ∼ 1 man-month

It will take me 1000 years to rewrite the libm

Rough Computations

3 precisions

Outline

1 Introducing libm

2 Problem statement

Solution

Metalibm

Write a tool that produces code for math functions

Analogy

assembly → compilerscode → code generators

. c f i s t a r t p r o csubq $8 , %r s p. c f i d e f c f a o f f s e t 16movl $52 , %r8dmovl $37 , %ecxmovl $15 , %edxmovl $ . LC0 , %e s imovl $1 , %e d ix o r l %eax , %eaxc a l l p r i n t f c h kx o r l %eax , %eaxaddq $8 , %r s p. c f i d e f c f a o f f s e t 8r e t. c f i e n d p r o c

i n c l u d e <s t d i o . h>

i n t main ( ) {i n t a , b , sum ;a = 1 5 ;b = 3 7 ;sum = a + b ;p r i n t f ( ”%d + %d = %d\n” , a , b ,

sum ) ;r e t u r n 0 ;

Metalibm prototype

Academic prototype

Produces flexible math functions implementations

Gives a function on a specified domain

It’s free and already available!Try it on http://metalibm.org/

State of the art

Supports almost all libm functions

The documentation has to be finished

We are working on the vectorizable implementations

We are working on support of specific functions (e.g. dickman’s, erf)

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

precision=30

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=30

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=31

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=32

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=33

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=34

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=35

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=36

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=37

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=38

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=39

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=40

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=41

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=42

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=43

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=44

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=45

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=46

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=47

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=48

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=49

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=50

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=51

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=52

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=53

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=54

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=55

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=56

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=57

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=58

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=59

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=60

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=61

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=62

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=63

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=64

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=65

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=66

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=67

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=68

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=69

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=70

degree

3 4 5 6 7 8 9 10 11 12 2 4

precision=70

degree

Making this movie meant implementing 6150 functionsO. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Outline

1 Introducing libm

2 Problem statement

Steps in function implementation

1 Eliminating special cases and values:zeros, infinities, NaNs, etc.

2 Argument reduction: x ∈ [α, β] is asmall interval

3 Polynomial approximation: Remezalgorithm for minimaxapproximations, polynomial of lowdegree

4 Reconstruction

Example

implement f (x) = ex

ex = 2x

log 2 = 2

⌉· 2

xlog 2−⌊

xlog 2

= 2E · ex−E log 2 = 2E · er

Range reduction

Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .

What properties do we know for erf (x), or a function defined by ODE?

When range reduction does not work

Piecewise-polynomial implementation

Algorithm to split the domain

Reconstruction becomes an execution of if-statements

Range reduction

Domain Splitting. Different approaches

f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52

Simple and naive: split domain into k equal parts, k is large.

splittingasin(x)

degrees

f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52

Bisection (use of a theorem of de la Vallee Poussin)

splittingasin(x)

degrees

f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52

Our optimized method (use of a theorem of de la Vallee Poussin)

splittingasin(x)

degrees

Results

name function f target accuracy domain I degree bound

f1 asin ε = 2−52 I = [0, 0.85] dmax = 8

f2 asin ε = 2−45 I = [−0.75, 0.75] dmax = 8

f3 erf ε = 2−51 I = [−0.75, 0.75] dmax = 9

f4 erf ε = 2−45 I = [−0.75, 0.75] dmax = 7

f5 erf ε = 2−50 I = [−0.75, 0.75] dmax = 6

Table: Flavor specifications

measure f1 f2 f3 f4 f5subdomain qty in bisection 24 15 9 12 39

subdomain qty in improved bisection 18 10 5 8 25

subdomains saved 25% 30% 44% 30% 36%

coefficients saved 42 31 27 24 79

memory saved (bytes) 336 248 216 192 632

Table: Table of measurements for several function flavorsO. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 23 / 25

Conclusion

writing code → writing code generator

gives functions for the specified domain, specified accuracies, etc.

parametrized function implementation in several minutes

new domain splitting procedure

adding argument reduction procedures

available at http://metalibm.org

work on vectorizable implementations and domain spltting

work on documentation

Thank you for your attention!Questions?

metalibm - inria

Documents

optimizationusingsurrogatemodels - inria

bigloo - méditerranée | inria

lorawan - inria

project-team reo numerical simulation of biological...

activity report 2018 - accueil | inria inria 2018 uk.pdf ·...

niklas elmqvist | purdue university pierre dragicevic |...

inria and europe

overview - inria

inria advancement report a. lécuyer, g. cirio, l....

inria - connect 3

p. del moral (inria team alea) inria & bordeaux...

inria-00274193, version 2 - 29 may 2008 the practice - hal -...

inria - strategic plan 2013-2017 "towards inria 2020"

the mobilitics inria-cnil project: privacy and...

welcome at inria

presentation of inria

shape interrogation - inria

laurent castanié (earth decision / paradigm – inria...

jet-lag - inria

introduction to the probabilistic method - sophia -...