distributed and streaming evaluation of batch queries for data-intensive computational turbulence...

12
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins University

Post on 21-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Distributed and Streaming Evaluation of Batch Queries for Data-Intensive

Computational TurbulenceKalin Kanov

Department of Computer Science Johns Hopkins University

Page 2: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Streaming Evaluation Method

• Linear data requirements of the computation allow for:– Incremental evaluation– Streaming over the data– Concurrent evaluation of batch queries

Page 3: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Motivation

• Heavy DB usage slows down the service by a factor of 10 to 20

• Query evaluation techniques adapted from simulation code do not access data coherently

• Substantial storage overhead incurred to localize each computation

• 95% of queries perform Lagrange Polynomial interpolation

Page 4: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Turbulence Database Cluster

Page 5: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

MHD Database

• Stores velocity, magnetic field, magnetic vector potential and pressure fields– 10 attributes, 4 bytes each– 1024 time-steps over a 10243 grid– 40TB total size

• In order to reduce total amount of I/O:– Smaller atoms (43 voxel)– No replication

Page 6: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Lagrange Polynomial Interpolation

f (x',y ') lypN

2 j

j1

N

(y') lxnN

2i

i1

N

(x')f (xnN

2i,y

pN

2 j)

Lagrange coefficientsData

Page 7: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Processing a Batch Query

Page 8: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Additional Optimizations

• Process the computation of values that are stored together concurrently

• Iterate in the appropriate order• Compute the Lagrange coefficients with the

procedures described by Purser and Leslie*

*R. J. Purser and L. M. Leslie. An Efficient Interpolation Procedure for High-Order Three-Dimensional Semi-Lagrangian Models. Monthly Weather Review, 119:2492–+, 1991.

Page 9: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Experimental Evaluation

• Random workloads:– across the entire cube space – a 1283 subset of the entire space

• Workload derived from the usage log of the Turbulence Database cluster

• Compare with:– Direct methods of evaluation

Page 10: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Setup

• Experimental version of the MHD database– ~300 timesteps of the velocity fields of the MHD

DNS– Two 2.33 GHz dual quad-core Windows 2003

servers with SQL Server 2008 and 8GB of memory– Data tables striped across 7 disks

Page 11: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins
Page 12: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins

Questions/Comments