distinguishing the signal from noise in an svd of simulation data

21
Distinguishing signal from noise in an SVD of simulation data DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT PAUL G. CONSTANTINE STANFORD UNIVERSITY 1 noise ICASSP David Gleich · Purdue

Upload: david-gleich

Post on 15-Jan-2015

666 views

Category:

Technology


3 download

DESCRIPTION

My talk at the massive data and signal processing workshop at ICASSP 2012 in Kyoto Japan.

TRANSCRIPT

Page 1: Distinguishing the signal from noise in an SVD of simulation data

Distinguishing signal from noise in an SVD of simulation data DAVID F. GLEICH !

PURDUE UNIVERSITY COMPUTER SCIENCE !

DEPARTMENT

PAUL G. CONSTANTINE !STANFORD UNIVERSITY

1

noise

ICASSP David Gleich · Purdue

Page 2: Distinguishing the signal from noise in an SVD of simulation data

Large scale non-linear, time dependent heat transfer problem

105 nodes, 103 time steps 30 minutes on 16 cores ~ 1GB

Questions What is the probability of failure? Which input values cause failure?

2 ICASSP David Gleich · Purdue

Page 3: Distinguishing the signal from noise in an SVD of simulation data

Insight and confidence requires multiple runs and hits the curse of dimensionality.

The problem A simulation run is time-consuming!

Our solution Use “big-data” techniques and platforms.

3 ICASSP David Gleich · Purdue

Page 4: Distinguishing the signal from noise in an SVD of simulation data

We store a few runs … Supercomputer Data computing cluster Engineer

Run 100-1000 simulations

Store them on the MapReduce cluster

Run 10000-100000 interpolated simulations for approximate statistics

… and build an interpolant from the data for computational steering.

4 ICASSP David Gleich · Purdue

Page 5: Distinguishing the signal from noise in an SVD of simulation data

Input "Parameters

Time history"of simulation

s "

5-10 of them

f

“a few gigabytes”

The Database

s1 -> f1 s2 -> f2

sk -> fk

f(s) =

2

66666666666664

q(x1, t1, s)...

q(xn

, t1, s)q(x1, t2, s)

...q(x

n

, t2, s)...

q(xn

, t

k

, s)

3

77777777777775

A single simulation at one time step

X =⇥f(s1) f(s2) ... f(sp)

The database as a matrix. 100GB – 100TB

The simulation as a vector

5 ICASSP David Gleich · Purdue

Page 6: Distinguishing the signal from noise in an SVD of simulation data

One-dimensional test problem

6

X =f(x) f1

f2

f5

x

“plot( )” “imagesc( )” X X

f (x , s) =

1

8s

log[1 + 4s(x

2 � x)]

X

i ,j = f (xi

, s

j

)

ICASSP David Gleich · Purdue

Page 7: Distinguishing the signal from noise in an SVD of simulation data

The interpolant

Motivation!Let the data give you the basis. Then find the right combination

X =⇥f(s1) f(s2) ... f(sp)

f(s) ⇡rX

j=1

uj↵j (s)

This idea was inspired by the success of other reduced order models like POD; and Paul’s residual minimizing idea.

These are the left singular vectors from X!

7 ICASSP David Gleich · Purdue

Page 8: Distinguishing the signal from noise in an SVD of simulation data

Why the SVD? It splits “space-time” from “parameters”

treat each right singular vector as samples of the unknown basis functions

split x and s

Interpolate v any way you wish

a general parameter

8 ICASSP David Gleich · Purdue

f (xi

, s

j

) =rX

`=1

U

i ,`�`Vj ,` =

rX

`=1

u`(xi

)�`v`(sj

)

f (xi

, s) =rX

`=1

u`(xi

)�`v`(s) v`(s) ⇡pX

j=1

v`(sj

)�(`)j

(s)

x is the “space-time” index

… and it has a “smoothness” property.

Page 9: Distinguishing the signal from noise in an SVD of simulation data

MapReduce and Interpolation

On the MapReduce cluster

Just one machine

Use SVD on MapReduce

cluster to get singular vector

basis

The Surrogate

Form a linear combination of singular vectors

New Samples

On the MapReduce cluster

f1

f2

f5

The Database

s1 -> f1 s2 -> f2

sk -> fk

sa -> fa sb -> fb

sc -> fc

Interpolation Sample

9/18 ICASSP David Gleich · Purdue

Interp.!

Page 10: Distinguishing the signal from noise in an SVD of simulation data

A B

A quiz!Which section would you rather try and interpolate, A or B?

10

ICASSP David Gleich · Purdue

Page 11: Distinguishing the signal from noise in an SVD of simulation data

How predictable is a !singular vector? Folk Theorem (O’Leary 2011) The singular vectors of a matrix of “smooth” data become more oscillatory as the index increases. Implication!The gradient of the singular vectors increases as the index increases. v1(s), v2(s), ... , vt (s) vt+1(s), ... , vr (s)

Predictable signal Unpredictable noise

11

ICASSP David Gleich · Purdue

!1 0 1!1

0

1

v1

!1 0 1!1

0

1

v2

!1 0 1!1

0

1

v3

!1 0 1!1

0

1

v7

Fig. 1. An example of when the functions v` become diffi-cult to interpolate. Each plot shows a singular-vector fromthe example in Section 3, which we interpret as a functionv`(s). While we might have some confidence in an interpola-tion of v1(s) and v2(s), interpolating v3(s) for s nearby 1 isproblematic, and interpolating v7(s) anywhere is dubious.

!1 0 1!1

0

1

v1

!1 0 1!1

0

1

v2

!1 0 1!0.5

0

0.5

v3

!1 0 1!0.5

0

0.5

v7

Fig. 2. For reference, we show a finer discretization of thefunctions above, which shows that interpolating v7(s) nearby1 is difficult.

Once we have determined the predictable bases, we in-terpolate them using procedures discussed above to createthe ↵`(s). From the singular values and left singular vectorscorresponding to the unpredictable bases, we can statisticallycharacterize the noise in the surrogate function. This statisti-cal characterization provides a time/space-varying predictionvariance, which is related to the errors in the surrogate.

4. COMPUTING AN SVD WITH MAPREDUCE

Recall that X is m-by-p, where m is the product of the numberof timesteps and spatial points, and p is the number of samples,and the biggest computational bottleneck in this algorithm iscomputing the SVD of this matrix. The matrix is extremely

tall-and-skinny because there usually be millions to billionsor rows and around 1000 columns. Consequently, we can usean R-SVD procedure [6] to compute the truncated-SVD of thematrix X by first doing a QR factorization of X, then an SVDon the small matrix R that results. Let

X = QR

be a QR-factorization, then R = UR⌃VT , and

X = QUR| {z }U

⌃VT

is the SVD.In practice, we use an approach in the MapReduce

paradigm [7], which first computes the R in the QR fac-torization, and then computes U = XV⌃+. This approach,although economical, may result in low accuracy if ⌃ is highlyill-conditioned and we continue to seek alternatives althoughwe do not seem to observe the worst case loss-of-accuracy. Forthe QR factorization, we use a MapReduce implementation [8]of the communication-avoiding QR scheme [9].

Initially, each row of the matrix X is a record in the MapRe-duce paradigm, as is each record of the left singular vectors U.Thus, after the SVD, the interpolation just involves distribut-ing the coefficients a via the distributed cache and performingthe inner-products. Moreover, we can compute the result formany interpolants simultaneously – a computational blockingtechnique that can amortize the effects of system overhead.

5. RESULTS

We now briefly present some results from a thermal-heatingsimulation of a complex geometry to illustrate the performanceof this method on a real-world problem. There are three param-eters s for this simulation, each of which controls a materialproperty. The simulation is done with the Aria package inthe SIERRA mechanics toolkit, both developed by SandiaNational Laboratories for their simulations. An individualsimulation has 240 time steps and 32768 spatial points andtakes about 30 minutes to complete on a 32-core machine. Ourdatabase contained the output of 1000 simulations.

The SVD of this data took 30 minutes using the Dumbopython wrapper [10] with Hadoop 0.21 [11]. In Figure 4, weshow a singular vector as a function. Subsequently, computingthe data a for a single interpolant took about 4 seconds on alaptop. To evaluate 1000 separate interpolants took 8 minutesusing a C++ code to do the matrix-vector products in a Hadoopstreaming code.

The Hadoop cluster had 62 nodes, with 4 cores oneach node. Thus, neglecting the cost of the SVD, themodel reduction procedure takes 8 minutes · (62 nodes ·4 cores/node)/1000 simulations = 1.98 core-minutes persimulation; whereas the original simulation took 32 cores ·30 minutes = 960 core-minutes, for a speedup of around 450.

Page 12: Distinguishing the signal from noise in an SVD of simulation data

A refined method with !an error model

Predictable Unpredictable ⌘j ⇠ N(0, 1)

Don’t even try to interpolate the predictable modes.

f(s) ⇡t(s)X

j=1

uj↵j (s) +rX

j=t(s)+1

uj�j⌘j

But now, how to choose t(s)?

12

ICASSP David Gleich · Purdue

Variance[f] = diag

0

@rX

j=t(s)+1

�2j ujuT

j

1

A

Page 13: Distinguishing the signal from noise in an SVD of simulation data

Our current approach to choosing the predictability

t(s) is the largest 𝜏 such that Better ideas? Come talk to me!

13

ICASSP David Gleich · Purdue

!1 0 1!1

0

1

v1

!1 0 1!1

0

1

v2

!1 0 1!1

0

1

v3

!1 0 1!1

0

1

v7

Fig. 1. An example of when the functions v` become diffi-cult to interpolate. Each plot shows a singular-vector fromthe example in Section 3, which we interpret as a functionv`(s). While we might have some confidence in an interpola-tion of v1(s) and v2(s), interpolating v3(s) for s nearby 1 isproblematic, and interpolating v7(s) anywhere is dubious.

!1 0 1!1

0

1

v1

!1 0 1!1

0

1

v2

!1 0 1!0.5

0

0.5

v3

!1 0 1!0.5

0

0.5

v7

Fig. 2. For reference, we show a finer discretization of thefunctions above, which shows that interpolating v7(s) nearby1 is difficult.

Once we have determined the predictable bases, we in-terpolate them using procedures discussed above to createthe ↵`(s). From the singular values and left singular vectorscorresponding to the unpredictable bases, we can statisticallycharacterize the noise in the surrogate function. This statisti-cal characterization provides a time/space-varying predictionvariance, which is related to the errors in the surrogate.

4. COMPUTING AN SVD WITH MAPREDUCE

Recall that X is m-by-p, where m is the product of the numberof timesteps and spatial points, and p is the number of samples,and the biggest computational bottleneck in this algorithm iscomputing the SVD of this matrix. The matrix is extremely

tall-and-skinny because there usually be millions to billionsor rows and around 1000 columns. Consequently, we can usean R-SVD procedure [6] to compute the truncated-SVD of thematrix X by first doing a QR factorization of X, then an SVDon the small matrix R that results. Let

X = QR

be a QR-factorization, then R = UR⌃VT , and

X = QUR| {z }U

⌃VT

is the SVD.In practice, we use an approach in the MapReduce

paradigm [7], which first computes the R in the QR fac-torization, and then computes U = XV⌃+. This approach,although economical, may result in low accuracy if ⌃ is highlyill-conditioned and we continue to seek alternatives althoughwe do not seem to observe the worst case loss-of-accuracy. Forthe QR factorization, we use a MapReduce implementation [8]of the communication-avoiding QR scheme [9].

Initially, each row of the matrix X is a record in the MapRe-duce paradigm, as is each record of the left singular vectors U.Thus, after the SVD, the interpolation just involves distribut-ing the coefficients a via the distributed cache and performingthe inner-products. Moreover, we can compute the result formany interpolants simultaneously – a computational blockingtechnique that can amortize the effects of system overhead.

5. RESULTS

We now briefly present some results from a thermal-heatingsimulation of a complex geometry to illustrate the performanceof this method on a real-world problem. There are three param-eters s for this simulation, each of which controls a materialproperty. The simulation is done with the Aria package inthe SIERRA mechanics toolkit, both developed by SandiaNational Laboratories for their simulations. An individualsimulation has 240 time steps and 32768 spatial points andtakes about 30 minutes to complete on a 32-core machine. Ourdatabase contained the output of 1000 simulations.

The SVD of this data took 30 minutes using the Dumbopython wrapper [10] with Hadoop 0.21 [11]. In Figure 4, weshow a singular vector as a function. Subsequently, computingthe data a for a single interpolant took about 4 seconds on alaptop. To evaluate 1000 separate interpolants took 8 minutesusing a C++ code to do the matrix-vector products in a Hadoopstreaming code.

The Hadoop cluster had 62 nodes, with 4 cores oneach node. Thus, neglecting the cost of the SVD, themodel reduction procedure takes 8 minutes · (62 nodes ·4 cores/node)/1000 simulations = 1.98 core-minutes persimulation; whereas the original simulation took 32 cores ·30 minutes = 960 core-minutes, for a speedup of around 450.

1�1

⌧X

i=1

�i

����@v

i

@s

����

< threshold

We can use more black gradients than red gradients, so error will be higher for red.

Page 14: Distinguishing the signal from noise in an SVD of simulation data

An experimental test case

A heat equation problem Two parameters that control the material properties

14

ICASSP David Gleich · Purdue

Page 15: Distinguishing the signal from noise in an SVD of simulation data

Our Reduced Order Model

The

Trut

h

Where the error is the worst

15

10-2 10-3

10-3

Error

Error

10-2

ICASSP David Gleich · Purdue

Hist

ogra

m o

f erro

rs

Page 16: Distinguishing the signal from noise in an SVD of simulation data

A Large Scale Example

Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD) 16

ICASSP David Gleich · Purdue

Page 17: Distinguishing the signal from noise in an SVD of simulation data

SVD from QR: R-SVD

Old algorithm … … helps when A is tall and skinny.

17

Let A = QR

then A = QUR⌃RVTR

ICASSP David Gleich · Purdue

Page 18: Distinguishing the signal from noise in an SVD of simulation data

Intro to MapReduce Originated at Google for indexing web pages and computing PageRank.

The idea Bring the computations to the data.

Express algorithms in "data-local operations. Implement one type of communication: shuffle. Shuffle moves all data with the same key to the same reducer.

MM R

RMM

Input stored in triplicate

Map output"persisted to disk"before shuffle

Reduce input/"output on disk

1 MM R

RMMM

Maps Reduce

Shuffle

2

3

4

5

1 2 M M

3 4 M M

5 M

Data scalable

Fault-tolerance by design

18

ICASSP David Gleich · Purdue

Page 19: Distinguishing the signal from noise in an SVD of simulation data

MapReduce TSQR summary MapReduce is great for TSQR!Data A tall and skinny (TS) matrix by rows

Map QR factorization of local rowsReduce QR factorization of local rows

Input 500,000,000-by-100 matrixEach record 1-by-100 rowHDFS Size 423.3 GBTime to compute    (the norm of each column) 161 sec.Time to compute    in qr(   ) 387 sec.

On a 64-node Hadoop cluster with 4x2TB, one Core i7-920, 12GB RAM/node

Demmel et al. showed that this construction works to compute a QR factorization with minimal communication

David Gleich (Sandia) 2/22MapReduce 2011

19

ICASSP David Gleich · Purdue

Page 20: Distinguishing the signal from noise in an SVD of simulation data

Key Limitations

Computes only R and not Q Can get Q via Q = AR+ with another MR iteration. " (we currently use this for computing the SVD) Not numerically orthogonal; iterative refinement helps. We are working on better ways to compute Q "(with Austin Benson, Jim Demmel)

20

ICASSP David Gleich · Purdue

Page 21: Distinguishing the signal from noise in an SVD of simulation data

Our vision!To enable analysts and engineers to hypothesize from "data computations instead of expensive HPC computations.

Paul G. Constantine "

21

ICASSP David Gleich · Purdue