metalibm - inria

Post on 13-Apr-2022

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Metalibm

Olga Kupriianova

Sorbonne Universites, UPMC Univ Paris 06,UMR 7606, LIP6,

F-75005 Paris, France

Rencontres Arithmetiques de l’Informatique Mathematique7 April 2015

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 1 / 25

Outline

1 Introducing libm

2 Problem statement

3 The Metalibm Project

4 General approach to function implementation

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 2 / 25

Mathematical libraries

The libm

gives a set of maths functions (exp, log, sin, cos, pow, ...)

supports important precisions (float, double, long double)

The existing implementations

glibc libm

libmcr by Suna

crlibm by ENS-Lyon

libultim by IBM

Intel’s proprietary mathimf

aAll other trademarks and copyrights are the property of their respective owners

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 3 / 25

The glibc libm

Why glibc libm?

Math library running on all linux-powered machines

Written by different developer teams

Some codes were written 20 years ago

Strange naming conventions

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 4 / 25

The glibc libm

Why glibc libm?

Math library running on all linux-powered machines

Written by different developer teams

Some codes were written 20 years ago

Strange naming conventions

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 4 / 25

The glibc libm

Why glibc libm?

Math library running on all linux-powered machines

Written by different developer teams

Some codes were written 20 years ago

Strange naming conventions

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 4 / 25

/∗∗ IBM Accurate Mathemat ica l L i b r a r y∗ w r i t t e n by I n t e r n a t i o n a l Bu s i n e s s Machines Corp .∗ Copy r i gh t (C) 2001-2013 Free So f tware Foundat ion , I n c . ∗//∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗//∗ ∗//∗ MODULE NAME: halfulp.c ∗//∗ ∗//∗ FUNCTIONS : halfulp ∗//∗ FILES NEEDED: mydefs . h d l a . h end ian . h ∗//∗ u roo t . c ∗//∗ ∗//∗Rout ine h a l f u l p ( doub l e x , doub l e y ) computes xy where ∗//∗ r e s u l t does not need round ing . I f the r e s u l t i s c l o s e r ∗//∗ to 0 than can be r e p r e s e n t e d i t returns 0. ∗//∗ I n the f o l l o w i n g c a s e s the f u n c t i o n does not compute ∗//∗ any th i ng and r e t u r n s a negative number: ∗//∗ 1 . i f the r e s u l t needs round ing , ∗//∗ 2 . i f y i s o u t s i d e the i n t e r v a l [ 0 , 2ˆ20−1] , ∗//∗ 3 . i f x can be r e p r e s e n t e d by x=2∗∗n f o r some i n t e g e r n ∗//∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 5 / 25

The libm

The restrictions of libm

Limited set of functions: trigonometric, exponentiation andlogarithmic, hyperbolic, special functions

Limited set of precisions (float, double, long double)

The input range: “as far as the input and output formats allow”

The only one implementation, hard-coded long time ago

No special code for exp(x) on [−10; 10] with 42 bits of accuracy

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 6 / 25

Speed vs. accuracy compromise

Common wisdom

The more accurate you compute, the more expensive it gets

Observation

The binary64 format provides about 16 decimal digits,while physical measurements might use only 3!

Compiler flag?

Why can’t we have an option like

gcc -c code.c -fp-transcendental=15ulps

to get less accuracy when we don’t need more

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 7 / 25

Speed vs. accuracy compromise

Common wisdom

The more accurate you compute, the more expensive it gets

Observation

The binary64 format provides about 16 decimal digits,while physical measurements might use only 3!

Compiler flag?

Why can’t we have an option like

gcc -c code.c -fp-transcendental=15ulps

to get less accuracy when we don’t need more

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 7 / 25

Speed vs. accuracy compromise

Common wisdom

The more accurate you compute, the more expensive it gets

Observation

The binary64 format provides about 16 decimal digits,while physical measurements might use only 3!

Compiler flag?

Why can’t we have an option like

gcc -c code.c -fp-transcendental=15ulps

to get less accuracy when we don’t need more

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 7 / 25

Outline

1 Introducing libm

2 Problem statement

3 The Metalibm Project

4 General approach to function implementation

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 8 / 25

Flavors of functions

Flavor

A flavor is a particular specification of a function

Absolute error in ulps

|X − x | ≤ α · ulp(x)

Relative Error ∣∣∣∣Xx − 1

∣∣∣∣ ≤ α · β−p+1

Correct Rounding

|X − x | ≤ 0.5 · ulp(x)

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 9 / 25

Table Maker’s Dilemma

Table Maker’s Dilemma

Sometimes huge accuracy is neededRuntime of function is not bounded or really huge

Recommendations

Precompute TMD worst cases

Avoid correct rounding wherever we do not really need it

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 10 / 25

Table Maker’s Dilemma

Table Maker’s Dilemma

Sometimes huge accuracy is neededRuntime of function is not bounded or really huge

Recommendations

Precompute TMD worst cases

Avoid correct rounding wherever we do not really need it

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 10 / 25

Different flavors of the functions

Precision Ulp Error DecimalDigits

NeededAccuracy

Verdict

Single 1 ulp 7 2−24 fastSingle (CR) 0.5 ulps 7 2−50 slowDouble 1 ulp 16 2−53.5 fastDouble (CR) 0.5 ulps 16 2−150 slow

Not fast enough? Still more flavors

Single 25 ulps 5 2−17 fasterSingle 211 ulps 3 2−11 the fastestDouble 211 ulps 12 2−40 fasterDouble 242 ulps 3 2−10 the fastest

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 11 / 25

Different flavors of the functions

Precision Ulp Error DecimalDigits

NeededAccuracy

Verdict

Single 1 ulp 7 2−24 fastSingle (CR) 0.5 ulps 7 2−50 slowDouble 1 ulp 16 2−53.5 fastDouble (CR) 0.5 ulps 16 2−150 slow

Not fast enough? Still more flavors

Single 25 ulps 5 2−17 fasterSingle 211 ulps 3 2−11 the fastestDouble 211 ulps 12 2−40 fasterDouble 242 ulps 3 2−10 the fastest

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 11 / 25

Different flavors of the functions

Precision Ulp Error DecimalDigits

NeededAccuracy

Verdict

Single 1 ulp 7 2−24 fastSingle (CR) 0.5 ulps 7 2−50 slowDouble 1 ulp 16 2−53.5 fastDouble (CR) 0.5 ulps 16 2−150 slow

Not fast enough? Still more flavors

Single 25 ulps 5 2−17 fasterSingle 211 ulps 3 2−11 the fastestDouble 211 ulps 12 2−40 fasterDouble 242 ulps 3 2−10 the fastest

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 11 / 25

Still more flavors

We should get even more flavors!

set floating-point flags or not 2 possibilities

support all rounding modes or not: 4 possibilities

save mode, perform computations, restore modeperform computations to support all the rounding modesperform integer-based computations

⇒ Eight more flavors

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 12 / 25

Task

Rough Computations

3 precisions

∼ 50 functions in libm

∼ 10 flavors in terms of accuracy

8 additional flavors

I have to implement 12000 functions...

Each function implementation takes ∼ 1 man-monthIt will take me 1000 years to rewrite the libm

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25

Task

Rough Computations

3 precisions

∼ 50 functions in libm

∼ 10 flavors in terms of accuracy

8 additional flavors

I have to implement 12000 functions...

Each function implementation takes ∼ 1 man-monthIt will take me 1000 years to rewrite the libm

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25

Task

Rough Computations

3 precisions

∼ 50 functions in libm

∼ 10 flavors in terms of accuracy

8 additional flavors

I have to implement 12000 functions...

Each function implementation takes ∼ 1 man-month

It will take me 1000 years to rewrite the libm

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25

Task

Rough Computations

3 precisions

∼ 50 functions in libm

∼ 10 flavors in terms of accuracy

8 additional flavors

I have to implement 12000 functions...

Each function implementation takes ∼ 1 man-monthIt will take me 1000 years to rewrite the libm

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25

Outline

1 Introducing libm

2 Problem statement

3 The Metalibm Project

4 General approach to function implementation

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 14 / 25

Solution

Metalibm

Write a tool that produces code for math functions

Analogy

assembly → compilerscode → code generators

. c f i s t a r t p r o csubq $8 , %r s p. c f i d e f c f a o f f s e t 16movl $52 , %r8dmovl $37 , %ecxmovl $15 , %edxmovl $ . LC0 , %e s imovl $1 , %e d ix o r l %eax , %eaxc a l l p r i n t f c h kx o r l %eax , %eaxaddq $8 , %r s p. c f i d e f c f a o f f s e t 8r e t. c f i e n d p r o c

i n c l u d e <s t d i o . h>

i n t main ( ) {i n t a , b , sum ;a = 1 5 ;b = 3 7 ;sum = a + b ;p r i n t f ( ”%d + %d = %d\n” , a , b ,

sum ) ;r e t u r n 0 ;

}

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 15 / 25

Metalibm prototype

Academic prototype

Produces flexible math functions implementations

Gives a function on a specified domain

It’s free and already available!Try it on http://metalibm.org/

State of the art

Supports almost all libm functions

The documentation has to be finished

We are working on the vectorizable implementations

We are working on support of specific functions (e.g. dickman’s, erf)

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 16 / 25

Demo

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 17 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=30

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=30

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=31

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=32

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=33

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=34

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=35

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=36

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=37

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=38

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=39

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=40

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=41

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=42

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=43

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=44

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=45

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=46

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=47

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=48

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=49

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=50

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=51

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=52

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=53

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=54

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=55

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=56

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=57

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=58

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=59

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=60

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=61

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=62

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=63

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=64

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=65

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=66

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=67

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=68

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=69

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=70

table

degree

time

0

0.2

0.4

0.6

0.8

1

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Timings for exponential function

3 4 5 6 7 8 9 10 11 12 2 4

6 8

10 12

14 16

0

0.2

0.4

0.6

0.8

1

time

precision=70

table

degree

time

0

0.2

0.4

0.6

0.8

1

Making this movie meant implementing 6150 functionsO. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25

Outline

1 Introducing libm

2 Problem statement

3 The Metalibm Project

4 General approach to function implementation

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 19 / 25

Steps in function implementation

1 Eliminating special cases and values:zeros, infinities, NaNs, etc.

2 Argument reduction: x ∈ [α, β] is asmall interval

3 Polynomial approximation: Remezalgorithm for minimaxapproximations, polynomial of lowdegree

4 Reconstruction

Example

implement f (x) = ex

ex = 2x

log 2 = 2

⌊x

log 2

⌉· 2

xlog 2−⌊

xlog 2

⌉=

= 2E · ex−E log 2 = 2E · er

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 20 / 25

Range reduction

Range reduction

Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .

What properties do we know for erf (x), or a function defined by ODE?

When range reduction does not work

Piecewise-polynomial implementation

Algorithm to split the domain

Reconstruction becomes an execution of if-statements

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25

Range reduction

Range reduction

Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .

What properties do we know for erf (x), or a function defined by ODE?

When range reduction does not work

Piecewise-polynomial implementation

Algorithm to split the domain

Reconstruction becomes an execution of if-statements

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25

Range reduction

Range reduction

Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .

What properties do we know for erf (x), or a function defined by ODE?

When range reduction does not work

Piecewise-polynomial implementation

Algorithm to split the domain

Reconstruction becomes an execution of if-statements

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25

Range reduction

Range reduction

Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .

What properties do we know for erf (x), or a function defined by ODE?

When range reduction does not work

Piecewise-polynomial implementation

Algorithm to split the domain

Reconstruction becomes an execution of if-statements

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25

Domain Splitting. Different approaches

f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52

Simple and naive: split domain into k equal parts, k is large.

0

0.5

1

1.5

2

0 0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

0.4

0.4

5

0.5

0.5

5

0.6

0.6

5

0.7

0.7

5

0.8

0.8

5

splittingasin(x)

0

1

2

3

4

5

6

7

8

9

0 0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

0.4

0.4

5

0.5

0.5

5

0.6

0.6

5

0.7

0.7

5

0.8

0.8

5

degrees

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 22 / 25

Domain Splitting. Different approaches

f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52

Bisection (use of a theorem of de la Vallee Poussin)

0

0.5

1

1.5

2

0 0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

0.4

0.4

5

0.5

0.5

5

0.6

0.6

5

0.7

0.7

5

0.8

0.8

5

splittingasin(x)

0

1

2

3

4

5

6

7

8

9

0 0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

0.4

0.4

5

0.5

0.5

5

0.6

0.6

5

0.7

0.7

5

0.8

0.8

5

degrees

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 22 / 25

Domain Splitting. Different approaches

f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52

Our optimized method (use of a theorem of de la Vallee Poussin)

0

0.5

1

1.5

2

0 0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

0.4

0.4

5

0.5

0.5

5

0.6

0.6

5

0.7

0.7

5

0.8

0.8

5

splittingasin(x)

0

1

2

3

4

5

6

7

8

9

0 0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

0.4

0.4

5

0.5

0.5

5

0.6

0.6

5

0.7

0.7

5

0.8

0.8

5

degrees

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 22 / 25

Results

name function f target accuracy domain I degree bound

f1 asin ε = 2−52 I = [0, 0.85] dmax = 8

f2 asin ε = 2−45 I = [−0.75, 0.75] dmax = 8

f3 erf ε = 2−51 I = [−0.75, 0.75] dmax = 9

f4 erf ε = 2−45 I = [−0.75, 0.75] dmax = 7

f5 erf ε = 2−50 I = [−0.75, 0.75] dmax = 6

Table: Flavor specifications

measure f1 f2 f3 f4 f5subdomain qty in bisection 24 15 9 12 39

subdomain qty in improved bisection 18 10 5 8 25

subdomains saved 25% 30% 44% 30% 36%

coefficients saved 42 31 27 24 79

memory saved (bytes) 336 248 216 192 632

Table: Table of measurements for several function flavorsO. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 23 / 25

Conclusion

writing code → writing code generator

gives functions for the specified domain, specified accuracies, etc.

parametrized function implementation in several minutes

new domain splitting procedure

adding argument reduction procedures

available at http://metalibm.org

work on vectorizable implementations and domain spltting

work on documentation

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 24 / 25

Q/A

Thank you for your attention!Questions?

O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 25 / 25

top related