metalibm - inria
TRANSCRIPT
Metalibm
Olga Kupriianova
Sorbonne Universites, UPMC Univ Paris 06,UMR 7606, LIP6,
F-75005 Paris, France
Rencontres Arithmetiques de l’Informatique Mathematique7 April 2015
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 1 / 25
Outline
1 Introducing libm
2 Problem statement
3 The Metalibm Project
4 General approach to function implementation
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 2 / 25
Mathematical libraries
The libm
gives a set of maths functions (exp, log, sin, cos, pow, ...)
supports important precisions (float, double, long double)
The existing implementations
glibc libm
libmcr by Suna
crlibm by ENS-Lyon
libultim by IBM
Intel’s proprietary mathimf
aAll other trademarks and copyrights are the property of their respective owners
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 3 / 25
The glibc libm
Why glibc libm?
Math library running on all linux-powered machines
Written by different developer teams
Some codes were written 20 years ago
Strange naming conventions
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 4 / 25
The glibc libm
Why glibc libm?
Math library running on all linux-powered machines
Written by different developer teams
Some codes were written 20 years ago
Strange naming conventions
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 4 / 25
The glibc libm
Why glibc libm?
Math library running on all linux-powered machines
Written by different developer teams
Some codes were written 20 years ago
Strange naming conventions
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 4 / 25
/∗∗ IBM Accurate Mathemat ica l L i b r a r y∗ w r i t t e n by I n t e r n a t i o n a l Bu s i n e s s Machines Corp .∗ Copy r i gh t (C) 2001-2013 Free So f tware Foundat ion , I n c . ∗//∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗//∗ ∗//∗ MODULE NAME: halfulp.c ∗//∗ ∗//∗ FUNCTIONS : halfulp ∗//∗ FILES NEEDED: mydefs . h d l a . h end ian . h ∗//∗ u roo t . c ∗//∗ ∗//∗Rout ine h a l f u l p ( doub l e x , doub l e y ) computes xy where ∗//∗ r e s u l t does not need round ing . I f the r e s u l t i s c l o s e r ∗//∗ to 0 than can be r e p r e s e n t e d i t returns 0. ∗//∗ I n the f o l l o w i n g c a s e s the f u n c t i o n does not compute ∗//∗ any th i ng and r e t u r n s a negative number: ∗//∗ 1 . i f the r e s u l t needs round ing , ∗//∗ 2 . i f y i s o u t s i d e the i n t e r v a l [ 0 , 2ˆ20−1] , ∗//∗ 3 . i f x can be r e p r e s e n t e d by x=2∗∗n f o r some i n t e g e r n ∗//∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 5 / 25
The libm
The restrictions of libm
Limited set of functions: trigonometric, exponentiation andlogarithmic, hyperbolic, special functions
Limited set of precisions (float, double, long double)
The input range: “as far as the input and output formats allow”
The only one implementation, hard-coded long time ago
No special code for exp(x) on [−10; 10] with 42 bits of accuracy
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 6 / 25
Speed vs. accuracy compromise
Common wisdom
The more accurate you compute, the more expensive it gets
Observation
The binary64 format provides about 16 decimal digits,while physical measurements might use only 3!
Compiler flag?
Why can’t we have an option like
gcc -c code.c -fp-transcendental=15ulps
to get less accuracy when we don’t need more
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 7 / 25
Speed vs. accuracy compromise
Common wisdom
The more accurate you compute, the more expensive it gets
Observation
The binary64 format provides about 16 decimal digits,while physical measurements might use only 3!
Compiler flag?
Why can’t we have an option like
gcc -c code.c -fp-transcendental=15ulps
to get less accuracy when we don’t need more
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 7 / 25
Speed vs. accuracy compromise
Common wisdom
The more accurate you compute, the more expensive it gets
Observation
The binary64 format provides about 16 decimal digits,while physical measurements might use only 3!
Compiler flag?
Why can’t we have an option like
gcc -c code.c -fp-transcendental=15ulps
to get less accuracy when we don’t need more
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 7 / 25
Outline
1 Introducing libm
2 Problem statement
3 The Metalibm Project
4 General approach to function implementation
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 8 / 25
Flavors of functions
Flavor
A flavor is a particular specification of a function
Absolute error in ulps
|X − x | ≤ α · ulp(x)
Relative Error ∣∣∣∣Xx − 1
∣∣∣∣ ≤ α · β−p+1
Correct Rounding
|X − x | ≤ 0.5 · ulp(x)
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 9 / 25
Table Maker’s Dilemma
Table Maker’s Dilemma
Sometimes huge accuracy is neededRuntime of function is not bounded or really huge
Recommendations
Precompute TMD worst cases
Avoid correct rounding wherever we do not really need it
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 10 / 25
Table Maker’s Dilemma
Table Maker’s Dilemma
Sometimes huge accuracy is neededRuntime of function is not bounded or really huge
Recommendations
Precompute TMD worst cases
Avoid correct rounding wherever we do not really need it
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 10 / 25
Different flavors of the functions
Precision Ulp Error DecimalDigits
NeededAccuracy
Verdict
Single 1 ulp 7 2−24 fastSingle (CR) 0.5 ulps 7 2−50 slowDouble 1 ulp 16 2−53.5 fastDouble (CR) 0.5 ulps 16 2−150 slow
Not fast enough? Still more flavors
Single 25 ulps 5 2−17 fasterSingle 211 ulps 3 2−11 the fastestDouble 211 ulps 12 2−40 fasterDouble 242 ulps 3 2−10 the fastest
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 11 / 25
Different flavors of the functions
Precision Ulp Error DecimalDigits
NeededAccuracy
Verdict
Single 1 ulp 7 2−24 fastSingle (CR) 0.5 ulps 7 2−50 slowDouble 1 ulp 16 2−53.5 fastDouble (CR) 0.5 ulps 16 2−150 slow
Not fast enough? Still more flavors
Single 25 ulps 5 2−17 fasterSingle 211 ulps 3 2−11 the fastestDouble 211 ulps 12 2−40 fasterDouble 242 ulps 3 2−10 the fastest
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 11 / 25
Different flavors of the functions
Precision Ulp Error DecimalDigits
NeededAccuracy
Verdict
Single 1 ulp 7 2−24 fastSingle (CR) 0.5 ulps 7 2−50 slowDouble 1 ulp 16 2−53.5 fastDouble (CR) 0.5 ulps 16 2−150 slow
Not fast enough? Still more flavors
Single 25 ulps 5 2−17 fasterSingle 211 ulps 3 2−11 the fastestDouble 211 ulps 12 2−40 fasterDouble 242 ulps 3 2−10 the fastest
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 11 / 25
Still more flavors
We should get even more flavors!
set floating-point flags or not 2 possibilities
support all rounding modes or not: 4 possibilities
save mode, perform computations, restore modeperform computations to support all the rounding modesperform integer-based computations
⇒ Eight more flavors
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 12 / 25
Task
Rough Computations
3 precisions
∼ 50 functions in libm
∼ 10 flavors in terms of accuracy
8 additional flavors
I have to implement 12000 functions...
Each function implementation takes ∼ 1 man-monthIt will take me 1000 years to rewrite the libm
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25
Task
Rough Computations
3 precisions
∼ 50 functions in libm
∼ 10 flavors in terms of accuracy
8 additional flavors
I have to implement 12000 functions...
Each function implementation takes ∼ 1 man-monthIt will take me 1000 years to rewrite the libm
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25
Task
Rough Computations
3 precisions
∼ 50 functions in libm
∼ 10 flavors in terms of accuracy
8 additional flavors
I have to implement 12000 functions...
Each function implementation takes ∼ 1 man-month
It will take me 1000 years to rewrite the libm
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25
Task
Rough Computations
3 precisions
∼ 50 functions in libm
∼ 10 flavors in terms of accuracy
8 additional flavors
I have to implement 12000 functions...
Each function implementation takes ∼ 1 man-monthIt will take me 1000 years to rewrite the libm
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 13 / 25
Outline
1 Introducing libm
2 Problem statement
3 The Metalibm Project
4 General approach to function implementation
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 14 / 25
Solution
Metalibm
Write a tool that produces code for math functions
Analogy
assembly → compilerscode → code generators
. c f i s t a r t p r o csubq $8 , %r s p. c f i d e f c f a o f f s e t 16movl $52 , %r8dmovl $37 , %ecxmovl $15 , %edxmovl $ . LC0 , %e s imovl $1 , %e d ix o r l %eax , %eaxc a l l p r i n t f c h kx o r l %eax , %eaxaddq $8 , %r s p. c f i d e f c f a o f f s e t 8r e t. c f i e n d p r o c
i n c l u d e <s t d i o . h>
i n t main ( ) {i n t a , b , sum ;a = 1 5 ;b = 3 7 ;sum = a + b ;p r i n t f ( ”%d + %d = %d\n” , a , b ,
sum ) ;r e t u r n 0 ;
}
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 15 / 25
Metalibm prototype
Academic prototype
Produces flexible math functions implementations
Gives a function on a specified domain
It’s free and already available!Try it on http://metalibm.org/
State of the art
Supports almost all libm functions
The documentation has to be finished
We are working on the vectorizable implementations
We are working on support of specific functions (e.g. dickman’s, erf)
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 16 / 25
Demo
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 17 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=30
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=30
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=31
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=32
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=33
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=34
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=35
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=36
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=37
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=38
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=39
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=40
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=41
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=42
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=43
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=44
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=45
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=46
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=47
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=48
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=49
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=50
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=51
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=52
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=53
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=54
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=55
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=56
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=57
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=58
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=59
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=60
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=61
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=62
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=63
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=64
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=65
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=66
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=67
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=68
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=69
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=70
table
degree
time
0
0.2
0.4
0.6
0.8
1
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Timings for exponential function
3 4 5 6 7 8 9 10 11 12 2 4
6 8
10 12
14 16
0
0.2
0.4
0.6
0.8
1
time
precision=70
table
degree
time
0
0.2
0.4
0.6
0.8
1
Making this movie meant implementing 6150 functionsO. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 18 / 25
Outline
1 Introducing libm
2 Problem statement
3 The Metalibm Project
4 General approach to function implementation
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 19 / 25
Steps in function implementation
1 Eliminating special cases and values:zeros, infinities, NaNs, etc.
2 Argument reduction: x ∈ [α, β] is asmall interval
3 Polynomial approximation: Remezalgorithm for minimaxapproximations, polynomial of lowdegree
4 Reconstruction
Example
implement f (x) = ex
ex = 2x
log 2 = 2
⌊x
log 2
⌉· 2
xlog 2−⌊
xlog 2
⌉=
= 2E · ex−E log 2 = 2E · er
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 20 / 25
Range reduction
Range reduction
Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .
What properties do we know for erf (x), or a function defined by ODE?
When range reduction does not work
Piecewise-polynomial implementation
Algorithm to split the domain
Reconstruction becomes an execution of if-statements
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25
Range reduction
Range reduction
Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .
What properties do we know for erf (x), or a function defined by ODE?
When range reduction does not work
Piecewise-polynomial implementation
Algorithm to split the domain
Reconstruction becomes an execution of if-statements
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25
Range reduction
Range reduction
Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .
What properties do we know for erf (x), or a function defined by ODE?
When range reduction does not work
Piecewise-polynomial implementation
Algorithm to split the domain
Reconstruction becomes an execution of if-statements
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25
Range reduction
Range reduction
Step is based on mathematical properties:na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .
What properties do we know for erf (x), or a function defined by ODE?
When range reduction does not work
Piecewise-polynomial implementation
Algorithm to split the domain
Reconstruction becomes an execution of if-statements
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 21 / 25
Domain Splitting. Different approaches
f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52
Simple and naive: split domain into k equal parts, k is large.
0
0.5
1
1.5
2
0 0.0
5
0.1
0.1
5
0.2
0.2
5
0.3
0.3
5
0.4
0.4
5
0.5
0.5
5
0.6
0.6
5
0.7
0.7
5
0.8
0.8
5
splittingasin(x)
0
1
2
3
4
5
6
7
8
9
0 0.0
5
0.1
0.1
5
0.2
0.2
5
0.3
0.3
5
0.4
0.4
5
0.5
0.5
5
0.6
0.6
5
0.7
0.7
5
0.8
0.8
5
degrees
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 22 / 25
Domain Splitting. Different approaches
f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52
Bisection (use of a theorem of de la Vallee Poussin)
0
0.5
1
1.5
2
0 0.0
5
0.1
0.1
5
0.2
0.2
5
0.3
0.3
5
0.4
0.4
5
0.5
0.5
5
0.6
0.6
5
0.7
0.7
5
0.8
0.8
5
splittingasin(x)
0
1
2
3
4
5
6
7
8
9
0 0.0
5
0.1
0.1
5
0.2
0.2
5
0.3
0.3
5
0.4
0.4
5
0.5
0.5
5
0.6
0.6
5
0.7
0.7
5
0.8
0.8
5
degrees
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 22 / 25
Domain Splitting. Different approaches
f = asin(x), I = [0, 0.85], d ≤ 8, ε = 2−52
Our optimized method (use of a theorem of de la Vallee Poussin)
0
0.5
1
1.5
2
0 0.0
5
0.1
0.1
5
0.2
0.2
5
0.3
0.3
5
0.4
0.4
5
0.5
0.5
5
0.6
0.6
5
0.7
0.7
5
0.8
0.8
5
splittingasin(x)
0
1
2
3
4
5
6
7
8
9
0 0.0
5
0.1
0.1
5
0.2
0.2
5
0.3
0.3
5
0.4
0.4
5
0.5
0.5
5
0.6
0.6
5
0.7
0.7
5
0.8
0.8
5
degrees
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 22 / 25
Results
name function f target accuracy domain I degree bound
f1 asin ε = 2−52 I = [0, 0.85] dmax = 8
f2 asin ε = 2−45 I = [−0.75, 0.75] dmax = 8
f3 erf ε = 2−51 I = [−0.75, 0.75] dmax = 9
f4 erf ε = 2−45 I = [−0.75, 0.75] dmax = 7
f5 erf ε = 2−50 I = [−0.75, 0.75] dmax = 6
Table: Flavor specifications
measure f1 f2 f3 f4 f5subdomain qty in bisection 24 15 9 12 39
subdomain qty in improved bisection 18 10 5 8 25
subdomains saved 25% 30% 44% 30% 36%
coefficients saved 42 31 27 24 79
memory saved (bytes) 336 248 216 192 632
Table: Table of measurements for several function flavorsO. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 23 / 25
Conclusion
writing code → writing code generator
gives functions for the specified domain, specified accuracies, etc.
parametrized function implementation in several minutes
new domain splitting procedure
adding argument reduction procedures
available at http://metalibm.org
work on vectorizable implementations and domain spltting
work on documentation
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 24 / 25
Q/A
Thank you for your attention!Questions?
O. Kupriianova (LIP6, Paris) Metalibm RAIM, 7 April 2015 25 / 25