grid-based data stream processing in e-science€¦ · lehrstuhl informatik iii: datenbanksysteme...

Post on 17-Jul-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science 1

Grid-basedData Stream Processingin e-Science

Richard Kuntschke1, Tobias Scholl1, Sebastian Huber1,Alfons Kemper1, Angelika Reiser1,Hans-Martin Adorf2, Gerard Lemson3, and Wolfgang Voges3

1Lehrstuhl Informatik III:1Datenbanksysteme1Fakultät für Informatik1Technische Universität München

2Max-Planck-Institut2für Astrophysik

3Max-Planck-Institut3für extraterrestrische3Physik

2

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Important Challenges in e-Science

In general:Large and exponentially growing amounts of dataDistributed data archivesNo unique identifiersUncertainty

In astrophysics:Spectral Energy Distributions (SEDs)

Used to classify celestial objects (active galactic nuclei, brown dwarfs, neutron stars, ...)Generation requires spatial (astrometric) matching

3

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Spatial (Astrometric) Matching

Current solutions …… load all data into main memory

Uses a lot of memoryInfeasible if memory size is insufficient

… process all data at once and deliver thecomplete result at the end

InefficientNo results until all processing has completed

4

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Our Contributions

StarGlobeGrid-based P2P DataStream Management System implemented on top of GlobusIn-network processing

Early filteringParallelizationPipeliningLoad-balancing

Mobile user-definedoperators

Astrophysical ExampleWorkflow

Astrometric matchingPerformance evaluation

5

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

The StarGlobe Architecture

Super-Peer BackboneQuery 1

Stream 0

Publish

Subscribe

filter

transform

Loadmobile operators

Fct-Provider

filter

transform

Stream 1

Publish

Query 2

Subscribe

6

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Traditional Approach: Bring Data to Code

union

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. BT

... ... ...

... ... ...

... ... ...

Data-Prov. CT

... ... ...

... ... ...

... ... ...

Data-Prov. DT

... ... ...

... ... ...

... ... ...

Data-Prov. A

7

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

New Approach: Bring Code to Data

T... ... ...... ... ...... ... ...

Data-Prov. Ascan

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. Bscan

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. Cscan

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. Dscan

NN_10

union

NN_10Fct-Provider

NN_10

8

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Mobile User-Defined Operators

Load user-defined operators from functionprovider servers in the network

Common interface for integrating externaloperators

Push-based iterator

Flexibility

9

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

StreamIterator Interface

open(Config, StreamWriter)Configuration parametersWriter for result stream

next(StreamIteratorEvent)Next element in input streamWriting output to result stream usingStreamWriter.write()

close()

10

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Communication betweenStreamProcessor and StreamIterator

StreamIterator

StreamHandler 1

StreamHandler 2

StreamHandler n

StreamWriter

StreamProcessor

XML InputStream 1

XML InputStream 2

XML InputStream n

XML OutputStream

Item 1 Item 2 Item n Result Item

11

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Astrophysical Example Workflow

peer-10

peer-9 peer-8

peer-6 peer-4 peer-5peer-7

peer-2peer-1peer-0 peer-3

Input ListRASS-BSC

2MASS FIRST USNOB1

NVSS GSC-2

SED assembly

12

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Distributed Query Evaluation Planplan-10

at peer-10

plan-8at peer-8

plan-5at peer-5enrichσ-5

transform-5

stream-5

χ²filter-2

join-2

plan-4at peer-4enrichσ-4

transform-4

stream-4

plan-9at peer-9χ²filter-3

join-3

plan-7at peer-7

plan-2at peer-2enrichσ-2

transform-2

stream-2

plan-3at peer-3enrichσ-3

transform-3

stream-3

χ²filter-1

join-1

plan-6at peer-6χ²filter-0

join-0

plan-1at peer-1enrichσ-1

transform-1

stream-1

χ²filter-4

join-4

display

plan-0at peer-0enrichσ-0

transform-0

stream-0

13

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Distributed Query Evaluation Plan

χ²filter-0

join-0

enrichσ-1

transform-1

stream-1

enrichσ-0

transform-0

stream-0

14

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Distributed Query Evaluation Plan

15

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Evaluation of Early Filtering

16

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Conclusion

Synergies between research in computer scienceand other scientific disciplines, e.g., astrophysics

StarGlobeHandling large data volumes efficiently

Early filtering, parallelization, pipeliningReturning first results early on

PipeliningFlexible support of domain-specific application logic

Mobile user-defined operators

Results also applicable to other domains

top related