1. 2 define the purpose of mkl upon completion of this module, you will be able to: identify and...

Upon completion of this module, you will be able to:

Performance Features

Using the Library

MKL Addresses:Solvers (BLAS, LAPACKEigenvector/eigenvalue solvers (BLAS, LAPACK)Some quantum chemistry needs (dgemm)PDEs, signal processing, seismic, solid-state physics (FFTs)Geneal scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)

Software Construction

Geometric Transformation

Don’t use Intel® Math Kernel (Intel® MKL) on …

Don’t use Intel® MKL on “small” counts.Don’t call vector math functions on small n.

§ But you could use Intel® Performance Primitives

BLAS (Basic Linear Algebra SubroutinesLevel 1 BLAS – vector-vector operations

15 function types48 functions

Level 2 BLAS – matrix-vector operations26 function types66 functions

Level 3 BLAS – matrix-matrix operations9 function types30 functions

Extended BLAS – level 1 BLAS for sparse vectors8 function types24 functions

LAPACK (linear algebra packageSolvers and eigensolvers. Many hundreds of routines totalThere are more than 1000 total user callable and support routinesDiscrete Fourier Transformations (DFT)Mixed radix, multi-dimensional transformsMulti threadedVML (Vector Math Library)Set of vectorized transcendental functionsMost of libm functions, but fasterVSL (Vector Statistics Library)Set of vectorized ran

BLAS and LAPACK* are both FortranLegacy of high performance computation

VSL and VML have Fortran and C interfacesDFTs have Fortran 95 and C interfacescblas intercate. It is more convenient for a C/C++ programmer to call BLAS

Support 32-bit and 64-bit Intel Processors

Large set of examples and testsExtensive documentation

04/18/23 10

The goal of all optimization is maximum speed.Resource limited optimization – exhaust one or more resource of system:

CPU: Register use, FP unitsCache: Keep data in cache as long as possible; deal with cache interleaving.TLBs: Maximally use data on each pageMemory bandwidth: Minimally access memoryComputer: Use all the processors available using threadingSystem: Use all the nodes available (cluster software)

Most of Intel MKL could be threaded but:Limited resource is memory bandwidthThreading level 1 and level 2 BLAS are mostly ineffective (O(n) )

There are numerous opportunities for threading:Level 3 BLAS (O(n3) )LAPACK* (O(n3) )FFTs (O(n log(n) )VML, VSL? Depends on processor and function

All threading is via OpenMP*All Intel MKL is designed and compiled for thread safety

Scenario 1: ifort, BLAS, IA-32 processor:ifort myprog.f mkl_c.lib

Scenario 2: CVF, LAPACK, IA-32 processor:f77 myprog.f mkl_s.lib

Scenario 3: Statically link a C program with DLL linked at runtime:link myprog.obj mkl_c_dll.libNote: Optimal binary code will execute at run time based on processor.

Most important LAPACK optimizations:Threading – effectively uses multiple CPUsRecursive factorization

Reduces scalar time (Amdahl’s law: t=tscalar + tparallel/pExtends blocking further into the code

No runtime library support required

One dimensional, two-dimensional, three-dimensionalMultithreadedMixed radixUser – specified scaling, transform signTransforms on imbedded matricesMultiple one-dimensional transforms on single cellStridesC and F90 interfaces

Basically a three-step processCreate a descriptor

Status = DftiCreate Descriptor (MDH,…)Commit the descriptor (instantiates it)

Status = DftiCommit Descriptor (MDH)Perform the transform

Status = DftiComputeForard (MDH, X)Optionally free the descriptor

Vector Math Library: Vectorized transcendental functions – like libm but better (faster)Interface: Have both Fortran and C interfacesMultiple accuracies

High accuracy (<1ulp)Lower accuracy, faster (<4 ulps)

Special value handling √(-a), sin(0), and so onError handling – can not duplicate libm here

It is important for financial codes (Monte Carlo simulations)Exponentials, logarithms

Other scientific codes depend on transcendental functionsError functions can be big time sinks in come codes

Set of random number generators (RNGs)Numerous non-uniform distributionsVML used extensively for transformationsParallel computation support – some functionsUser can supply own BRNG or transformationsFive basic RNGs (BRNGs) – bits, integer, FP

◦ MCG31, R250, MRG32, MCG59, WH

Gaussian (two methods)ExponentialLaplaceWeibullCauchyRayleighLognormalGumbel

Basically a 3-step ProcessCreate a stream pointer. VSLStreamStatePtr stream;Create a stream.vslNewStream(&stream,VSL_BRNG_MC_G31, seed );Generate a set of RNGs.vsRngUniform( 0, &stream, size, out, start, end );Delete a stream (optional).vslDeleteStream(&stream);

Compare the performance of C source code (RAND function) and VSL.Exercise control of the threading capabilities in MKL/VSL.

Intel® Math Kernel Library is a broad scientific/engineering math library.It is optimized for Intel® processors.It is threaded for effective use on SMP machines.

1. 2 define the purpose of mkl upon completion of this module, you will be able to: identify and...

blas slide

library slide

blas vectorvector operations

blas on3 lapack

functions level

functions extended blas

blas matrixvector operations

dont use intel mkl

Documents

getting reproducible results with intel® mkl 11.0

mkl userguide lnx

markel (mkl) deep dive analysis · 2020-07-20 · markel...

performance libraries: intel math kernel library (mkl) intel...

method for customer review category based mkl-svm

test uts mkl

using the intel math kernel library (intel® mkl) and intel...

intel(r) mkl user's guide

mkl for category recognition kumar srijan syed ahsan...

1sww ,epp *evq +spfsvri 0eri ,mkl 0ikl`

performing acoustic, vibro-acoustic and aero-acoustic...

medical terminology. objectives upon completion of this unit...

mvd, mv, mkv, mkl, 2161c, mxv series · mkl type part no....

(mini capsule)...mkl series filter capsules a9.5 rev 2019.01...

rca victor mkl 3000 mexican 10 inch series - bsnpubs.com -...

fortran & link with library & brief explanation of mkl blas

mkl for category recognition

mkl infocv v2.5

rouen simple mkl

mkl feb2014 - michigan state...