high performance computing, cloud computing, and remote...

60
High Performance Computing, Cloud computing, and Remote Sensing Big Data Lizhe Wang Institute of Remote Sensing and Digital Earth (RADI) Chinese Academy of Sciences (CAS) email: [email protected] March 2016, Canberra, Australia Lizhe Wang HPC, Clouds, RS Big Data

Upload: others

Post on 17-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

High Performance Computing, Cloud computing,and Remote Sensing Big Data

Lizhe Wang

Institute of Remote Sensing and Digital Earth (RADI)Chinese Academy of Sciences (CAS)

email: [email protected]

March 2016, Canberra, Australia

Lizhe Wang HPC, Clouds, RS Big Data

Page 2: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Contents

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 3: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

ToC

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 4: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

International Development of Earth Observation

0

5

10

15

20

25

30

35

40

1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010

卫星数

Nimbus, 1964514 EO satellites till Dec. 2011EO has played an important role for human being society

Lizhe Wang HPC, Clouds, RS Big Data

Page 5: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

International Development of Earth Observation

EO satellites in 1962-2012

Lizhe Wang HPC, Clouds, RS Big Data

Page 6: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

International Development of Earth Observation

Plan of EO satellites in 2013-2035

Lizhe Wang HPC, Clouds, RS Big Data

Page 7: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Earth Observation in China

China Remote Sensing Satellite Ground Station, a member ofthe Landsat Ground Station Operations Working Group,boasts one of the world’s highest capacities for receiving,processing, and distributing satellite data. With over 3.3million scenes of satellite data accumulated on file since 1986,it is regarded as the largest Earth observation satellite dataarchive in China.Its three stations at Miyun, Kashi, and Sanya can receive datasimultaneously from satellites covering the whole territory ofChina and 70% of Asia.It receives 12 satellites currently, will receive 50 satellite in2025, and become the LARGEST ground station for civil usein the world.

Lizhe Wang HPC, Clouds, RS Big Data

Page 8: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Earth Observation in China

Lizhe Wang HPC, Clouds, RS Big Data

Page 9: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Earth Observation in China

Lizhe Wang HPC, Clouds, RS Big Data

Page 10: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

ToC

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 11: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

EO Data Processing flow: an Overview

����

DP

S

� �1-9#

1#

��

��

2#

� �1-4#

1-3#

����

����

DP

S

����

���

� �1-9#

� �

� �

� �

� �

� �

DP

S

� �1-9#

1#

��

��

2#

����

���

� �1-9#

� �

� �

� �

� �

� �

� �

� �

� � �

��

��

�� �

��

DP

S

� �1-6#

��

��

4#

����

��

� �1-6#

� �

� �

� �

. . .. . .

. . .

���

��

��

��

��'���%���)�!�

SAR �BJ-1�QuickBird�

HJ-1A�

Earth Observation DataDEM

GCP

�%��#%"��&&�!�

��" �'%��

����" �'%��

�' "&#��%��

���(���������%"��&&�!�

��!� �"%%��'

�"&���

�(&�"!

%'�"%��'�����'�"!

!�"% �'�"! ��&'%��'�"!

��!�%��

�"%�&'

���

���&&�����'�"!

�� �

��'�%

�!"*

RS Data Processing

��'����$(�%�!���!�����"%��!�

��'����'�� �&�%)�'�"!���'*"%�

L2

Mosaic

�����'�������&�!�

�%"(��'

��"�������!���"%�&'���%� �(�&����

��""�

�!��&�

�%������ �'�������##����'�"!&�

Lizhe Wang HPC, Clouds, RS Big Data

Page 12: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Data acquisition: data volumes and data ratesSatellites Data Rates (MB/S) Data Amount (GB/Day) Data Amount (TB/Year)

ZY-02 113.98 115 40.99ZY-03 304 334 119.05HJ-1A 120 114 40.63HJ-1B 60 57 20.32HJ-1C 320 187.5 66.83

LandSat8 440.00 241.70 47.33RadaSat2 105.00 57.68 11.29RadaSat-1 105.00 57.68 11.29SPOT-5 100.00 54.93 10.76

LANDSAT5 85.00 28.02 9.99SPOT-4 50.00 10.99 3.92

RADARSAT-1 105.00 34.61 12.34ENVISAT 100.00 32.96 11.75IRS-P6 210.00 46.14 16.45

Single RS data set: GBDaily receiving RS data set: TBNational RS data archive: PBInternational RS data archive: EB

Lizhe Wang HPC, Clouds, RS Big Data

Page 13: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Data volumes of a single scene of L0, L1, and L2 Products

Satellites(Sensor) L0(MB) L1(MB) L2(MB)

BJ-1(MS) 228.3 228.3 333.32BJ-1(PAN) 98.9 98.9 150.33

SPOT-5(PAN) 572 572 45.96SPOT-5(MS) 144 144 252.21

LANDSAT7(ETM) 337 337 532LANDSAT8(UTM) 481 481 781

Lizhe Wang HPC, Clouds, RS Big Data

Page 14: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Data volume of value-added processing

Satellites Fine Ortho Regional Mosaic (GB)(Sensor) 1 Scene(MB) 1 Scene(MB) 15 Scenes 512 Scenes

LANDSAT7 511.52 515.3 7.48 255.5ZY-02C(PAN) 201.61 206.65 N/A N/AZY-02C (HRI) 467.3 475.15 N/A N/A

Lizhe Wang HPC, Clouds, RS Big Data

Page 15: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Data Volume Throughout the Entire Flow

 Processing

RC RD TR L0 L1 L2 L3 L4 MO CL AR OS

Data

Am

ount

(Log

D):L

og(M

B)

1

2

3

4

5

6

CBERS-02C(HRI) LANDSAT5

Lizhe Wang HPC, Clouds, RS Big Data

Page 16: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Comparative analysis of the data throughput rate

 Processing

RC RD TR L0 L1 L2 L3 L4 MO CL AR OS

Spee

d (M

bps)

0

20

40

60

80

100

120

CBERS-02(PAN) CBERS-02(HRI) LANDSAT5

Lizhe Wang HPC, Clouds, RS Big Data

Page 17: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Comparative Analysis of I/O Occupation

 Processing(Algorithms)

L1 L2 L3 L4 MO CL

I/O P

erce

ntag

e (%

)

10

20

30

40

50

60

500M1GB5GB10GB22GB

Yan Ma, Lizhe Wang. Towards building an empirical index for data-intensive remotesensing image processing applications. Future Generation Computer Systems.

Lizhe Wang HPC, Clouds, RS Big Data

Page 18: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Computational Complexity of Algorithms

Stage Algorithm Complexity Parallelism

Radiometirc O(n) ExcellentPreprocessing Geometirc O(n2) Excellent

Registration O(n2log(n)) ExcellentK-T Trans O(p2bn2) Good

Value-added Processing Convolution O(k2n2) GoodMosaicking O(n3) Normal

K-mean O(rbmn2) NormalInformation Extraction Bayers O(bmn2) good

BP O(n2log(n)) good

Lizhe Wang HPC, Clouds, RS Big Data

Page 19: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Computational Complexity of Algorithms

 Processing(Algorithms)

L0 L1 L2 L3 L4 MO CL AR OS

Com

puta

tiona

l Com

plex

ity: l

ogn(

C)

.5

1.0

1.5

2.0

2.5

3.0

3.5

Complexity (C)

Lizhe Wang HPC, Clouds, RS Big Data

Page 20: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

The DIRS model

�� #����)����$ "#

���(!�"����$#� ���)!������� "�$��#

*Flops/IO����" %��!%$��� %�$+

��!�"������#$���$� �� �� ������(

*��!�"�����DI��%"&�#+

� ������� ��������$ " �,F(C,D,N,T..)

� ������������� ����

�� �

Processing

RC Rd TR L0 L1 L2 L3 L4 MO CL AQ

Data

Am

ount

(Log

D):L

og(M

B)

1

2

3

4

5

6

CBERS-02C(HRI) LANDSAT5

Processing

RC Rd TR L0 L1 L2 L3 L4 MO CL AQ

Data

Am

ount

(Log

D):L

og(M

B)

1

2

3

4

5

6

CBERS-02C(HRI) LANDSAT5

�������

������� ����

������

� �!%$�$� ��������'��$�

���������������

F��� �����

�#$���$� ��� �� �

�#$���$� ��� �� �

Lizhe Wang HPC, Clouds, RS Big Data

Page 21: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

The DIRS model

DIRS =

0 if

10DC

60T < 1

log10DC

60T if10DC

60T > 1

(1)

Lizhe Wang HPC, Clouds, RS Big Data

Page 22: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Representation of data volume ‘D’

D = max {Din, Dm1, Dm2, . . . , Dout} (2)

��� ����� ��� ��� ���

Lizhe Wang HPC, Clouds, RS Big Data

Page 23: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Representation of time requirement ‘T’

Table: Quantization of computational complexity ‘C’

Computational Complexity ‘C’

O(n) 1O(n2) 2

O(n2log(n)) 2.5O(k2n2) 2.2O(n3) 3

O(n3) ∼ O(n4) 4

Lizhe Wang HPC, Clouds, RS Big Data

Page 24: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Representation of computational complexity ‘C’

Table: Quantization of Time Requirement ‘T’

Time Requirement ‘T’

Second 1Minute 2Hour 3Day 3.5

Week 4>Month 5

Lizhe Wang HPC, Clouds, RS Big Data

Page 25: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Empirical Estimation of DIRS Through the Entire RS DataProcessing Flow

Algorithms ‘C’ ‘D’ ‘T’ DIRS

L1 1 3 2 0L2 2 3 2 2.4L3 2 3 2 2.4L4 2 3 2 2.4

MO 5.5 5.5 3 11.2CL 5.5 3 3 8.4AR 5.5 3 3 11.2OP 4.5 3 3 3.7

Lizhe Wang HPC, Clouds, RS Big Data

Page 26: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Empirical Estimation of the DIRS Indicator

 Proessing(Algorithms)

L0 L1 L2 L3 L4 MO CL AR OS

DI I

ndic

ator

(1to

10)

0

2

4

6

8

10

12

DI Indicator (Empirical Estimation)

Lizhe Wang HPC, Clouds, RS Big Data

Page 27: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Some Findings for RS Big Data

Big DataCompute-intensive and data-intensiveUnbalanced Data FlowComplex Algorithm Flow

Lizhe Wang HPC, Clouds, RS Big Data

Page 28: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

ToC

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 29: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Philosophy of RS Big Data Computing

Data processing models and algorithmsApproximate computationReduced ComputationPartial Computation

System and SoftwarePlatforms: Clusters, Data Centers, Clouds, High-speednetworkingSoftware system: Parallel file systems, High Performance IO,Programming models, Memory models, Runtime System

Paradigm of RS Data Product ServicePrecise Prediction, On-Demand Computing, Active ServiceStandalization, Serialisation and Automation of RS Dataproduction

Lizhe Wang HPC, Clouds, RS Big Data

Page 30: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Philosophy of RS Big Data Computing

Data processing models and algorithmsApproximate computationReduced ComputationPartial Computation

System and SoftwarePlatforms: Clusters, Data Centers, Clouds, High-speednetworkingSoftware system: Parallel file systems, High Performance IO,Programming models, Memory models, Runtime Systems

Paradigm of RS Data Product ServicePrecise Prediction, On-Demand Computing, Active ServiceStandalization, Serialisation and Automation of RS Dataproduction

Lizhe Wang HPC, Clouds, RS Big Data

Page 31: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

High Performance Processing for EO Big Data

Where:EO Big Data and itsApplications

What:Quantitative analysis of EO BigData and its processingalgorithmsStructed, semi-structed andunstructured EO Big DataHigh performance processingalgorithms and software models

Systems and PlatformsClustersSatellite data centresClouds, and Multiple datacenters

Data ModelMemory model: GDDMStorage model: HPGFSIO model: ADIOS,HA

Processing ModelDistributed Memory ProcessingModel(MPI)Workflow ModelBatch Processing ModelStreaming data

Production SystemParallel RS Image ProcessingSystem (PIPS, PIPS+)RS Data Production Systemacross Multiple SatelliteDatacenters

Lizhe Wang HPC, Clouds, RS Big Data

Page 32: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Contents

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 33: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

A System Viewr11 r12 r13 r14 r15

r21 r22 r23 r24 r25

r31 r32 r33 r34 r35

r41 r42 r43 r44 r45

r51 r52 r53 r54 r55

r16

r26

r36

r46

r56r61 r62 r63 r64 r65 r66

r17

r27

r37

r47

r57r67

r71 r72 r73 r74 r75 r76 r67

r18

r28

r38

r48

r58r68

r78r81 r82 r83 r84 r85 r86 r87 r88

1 2

5 6

3 4

7 8

11 12

13 14 15 16

9 10

3-� �����

6r43 r44

r34r33

r43 r44

r34r33

r43 r44

r34r33

Z-Order Diagonal Hilbert

15

1 2

5 6

3 4

7 8

11 12

13 14 16

9 10

15

1 2

5 6

3 4

7 8

11 12

13 14 16

9 10

r43 r44

r34r33

4

8

12

1615

11

7

32

6

10

1413

1

5

9

13 14 10 9 5 61 2 4 8 12 1615117 3

13

r43 r44

r34r33

r43 r44

r34r333-� �����

r43 r44r34r33r43 r44r34r33r43 r44r34r33

14 10 9 5 1 2 6 7 3 4 8 12 11 15 16

Bricks ��

(Z-order ��)

��Brick ��

����������

Yan Ma, Lizhe Wang, et. al. Distributed data structure templates for data-intensive remote sensingapplications. Concurrency and Computation: Practice and Experience, 25(12): 1784-1797 (2013)

Lizhe Wang HPC, Clouds, RS Big Data

Page 34: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Implementation

map ( ) 1

MPI_Win

1 3 1 N1 1 1 2

Multi-core Cluster

1

map_f

2

3

4

map_f

1

2

3

N

/Node

/core

T1 T2 T3 T4

(Margin)

T1

T2

T3

T4

T1

T2 T3

T4

T5

T1 T3

T2 T4

T1

T2

T3

T4

Ti Ai

Lizhe Wang HPC, Clouds, RS Big Data

Page 35: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Contents

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 36: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Generic Parallel Processing Model for RS Big Data

RS-GPPS

Generic Algorithm Skeletons

(Templates)

Distributed RS Data Templates

Job Class (RS sequential code/

function)

Data Partition Strategy Function

Skeletons Instantiation Data Templates Instantiation

Parameter Specifying

Parameter Specifying

Massive Remote Sensing Image Parallel Processing Programs

Code Generation

How to generate parallel remote sensing programs with RS-GPPS skeletons

Parameters of Distributed RS Data Templates • Data type of RS image • data partition strategies

Parameter of the Skeleton • Job class

C/C++��Job�

MPI+C/C++�����

Job�

Lizhe Wang, Yan Ma, etc. Yan Ma, Lizhe Wang, et. al. Generic Parallel Programming for Massive RemoteSensing Data Processing. CLUSTER 2012: 420-428

Lizhe Wang HPC, Clouds, RS Big Data

Page 37: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Generic Parallel Processing Model for RS Big Data

RS-GPPS

Template <Class T> Class Skeleton{}

Generic Algorithm Skeletons

(Templates) Distributed RS Data

Templates

Parallel FS

3

N

1

2

1 2

3 4

Data Distribution

Info

Multicore Cluster

1 2 3 N

MPI Runtime Environment

Share Distributed Spaces

0

1 2 3 4

Data Distribution (node mapping)

Parallel I/O

data block load/write

! Implement skeleton (Perform Computations on Distributed RS Data) 1)  Divide the Task into subtasks; 2) Load the Data Blocks concurrently through parallel I/O; 3) Execute Sequential Codes in parallel

! Implement Distributed RS Data 1) Slice RS data logically into overlapped blocks; 2) Distributed blocks across nodes; 3) Converged blocks as a Global RS data via one-sided messaging of MPI

one-sided messaging �

Lizhe Wang HPC, Clouds, RS Big Data

Page 38: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Contents

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 39: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS Data Access Pattern

Image

Memory

File

Read/Write

(b) Rectangular-Block Access Pattern

MPI_IO <offset,count=2>

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

Input

Output

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

2 3

Read/Write

r43 r44r33 r34

r43 r44

r52r51 r53 r54 Read/Write

r53 r54

r33 r34

) (

r43

r53 r54

r33 r34

(a) Neighbor-Based Processing(Regional dependent)

Lizhe Wang HPC, Clouds, RS Big Data

Page 40: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS Data Access Pattern

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44Image

Memory

File

Read/Write

(b) Cross-File Access Pattern

MPI_IO <offset,count>

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

Output

Read/Write

r43 r44r33 r34

10

2 3

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

) (

r43 r44r33 r34

(a) Band-Related Processing(Band dependent)

Input

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

Input

Input

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

r43 r44r33 r34 r43 r44r33 r34

r43 r44r33 r34

r43 r44r33 r34

r43 r44r33 r34

r43 r44r33 r34

r43 r44r33 r34

Read/WriteRead/Write

Lizhe Wang HPC, Clouds, RS Big Data

Page 41: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS Data Access Pattern

Image

Memory

File

Read/Write

(b) Diagnal Access Pattern

1

MPI_IO <offset,count=1>

0

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

Input

Output

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

2

3 Read/Write

r33 r44

r52r51 r53 r54

Read/Write

) )

) ) (

r22r11

r33 r44

(a) Irregular Processing(Global or Irregular Dependent)

4

5Read/Write

r11 r22

r11 r22 r33 r33

Image

Memory

File

Read/Write

(c) Polygon Access Pattern

MPI_IO <offset,count>

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

r41 r42 r43 r44

Read/Write

r31 r32

r52r51 r53 r54

Read/Write

)

) ) (

r22 r31r33

r42

r42r22 r33

r22 r31 r32 r33 r42

r32

)

Lizhe Wang HPC, Clouds, RS Big Data

Page 42: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Optimized Data Layout in HPGFS

���������� �� ����

����������������

����������

1 2

���� ������������������

Geographical Metadata Image Info Map Info Projection Para Satellite & Sensor Info ….

Image Data Image Band 1 Image Band 2 Image Band 3 …. Image Band n

Basic Operations getGeoMetadata() setGeoMetadata() projectionReform() resampling() …..

Geographical�Metadata

Image Info

Map Info Projection Para(differ with projections)

Satellite & Sensor Para(differ with sensors)

UTM

EllipsoidDatumUTM Zone…..

Transvers Mercator

EllipsoidDatumCentral meridianCoordinating orgin……

ETM+Sensor typeOrbitSemi-majorSemi-minorOrbital Inclination…..

RadarsatSensor typeOrbitWave lengthRange Resolution Azimuth……

�����������

Metadata

Geodata

5 6

r11 r12 r13 r14 r15

r21 r22 r23 r24 r25

r31 r32 r33 r34 r35

r41 r42 r43 r44 r45

r51 r52 r53 r54 r55

r16

r26

r36

r46

r56r61 r62 r63 r64 r65 r66

r17

r27

r37

r47

r57r67

r71 r72 r73 r74 r75 r76 r67

1 2

5 6

3 4

7 8

11 12

13 14 15 16

9 10

3 4 7 8 9 10 13 14 11 12 15 16

����������

Lizhe Wang HPC, Clouds, RS Big Data

Page 43: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Optimized Data Layout in HPGFS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16

13 14 10 9 5 1 2 6 7 3 4 8 12 11 15 16 13 9 14 5 10 15 1 6 11 16 2 7 12 3 8 4

���������������������(Consecutive-Line Access Pattern)

�����������������������������(���� �� Access Pattern)

��������������(������ Access Pattern)

�������������������(��������������� Access Pattern)

1,2,3,4,9,10,11

,12

5,6,7,813,14,15,16

1,5,9,13,3,7,11,15

2,6,10,14,4,8,12,16

9,10,13,14,3,4,7,8,

1,2,5,6,12,11,15,

16

13,9,14,1,6,11,16,3,8,4

5,10,15,2,7,12

15

1 2

5 6

3 4

7 8

11 12

13 14 16

9 10

15

1 2

5 6

3 4

7 8

11 12

13 14 16

9 10

15

1 2

5 6

3 4

7 8

11 12

13 14 16

9 10

15

1 2

5 6

3 4

7 8

11 12

13 14 16

9 10

r43 r44

r34r33

r11 r12 r13 r14 r15

r21 r22 r23 r24 r25

r31 r32 r33 r34 r35

r41 r42 r43 r44 r45

r51 r52 r53 r54 r55

r16

r26

r36

r46

r56

r61 r62 r63 r64 r65 r66

r17

r27

r37

r47

r57

r67

r71 r72 r73 r74 r75 r76 r67

r18

r28

r38

r48

r58

r68

r78

r81 r82 r83 r84 r85 r86 r87 r88

1 2

5 6

3 4

7 8

11 12

13 14 15 16

9 10

����������� �����with Space-Filling Curves

3-� �����

Brick-Based

� ���� ���������

(Access-Pattern Awared)

6

4

8

12

1615

11

7

32

6

10

1413

1

5

9

13 14 10 9 5 61 2 4 8 12 1615117 3

13

r43 r44

r34r33

r43 r44

r34r33

3-� �����

r43 r44

r34r33

r43 r44

r34r33

r43 r44

r34r33

r43 r44r34r33r43 r44r34r33r43 r44r34r33

14 10 9 5 1 2 6 7 3 4 8 12 11 15 16

Arranging Bricks(Example:Hilbert Curve)

Data Organizing inside Bricks

(Example:Z-order Curve)

������������������ (By Bricks)

����������

����������

����������

����������

Brick Array

Byte Array

Pixel Array

Building Data Layout with Application Aware Policies Different Space-Filling Curves for Different Access Patterns

Lizhe Wang, Yan Ma, etc. A Parallel File System with Application-aware Data Layout Policies in DigitalEarth”, IEEE Transactions on Parallel & Distributed Systems, doi:10.1109/TPDS.2014.2322362

Lizhe Wang HPC, Clouds, RS Big Data

Page 44: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Contents

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 45: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Large-scale RS images Parallel Processing

Input file;XML file;Individual Executable Program;

Output file;

Individual computing task

3

2

11

10

9

17

16

20

22

12

18

21

24

25

6

13

5

19

7

14

23

8

15

1

4

Original sequence

1 5

9

19

2

10

3

11

4

12

6

13

7

14

8

15

16 17 18

20 2221

23

24

25

Task tree based parallel computing

Lizhe Wang HPC, Clouds, RS Big Data

Page 46: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Large-scale RS images Parallel Processing

Lizhe Wang HPC, Clouds, RS Big Data

Page 47: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

DAG-based scheduling model

( A AAD

A

. A AAD

A A

,A A

D

C A

1��������

2������� 3��������

4������� 5������ 6������ 7������

AA A A

A

LRMS

, A

D

A A )

A

)

Yan Ma, Lizhe Wang, etc. Task tree based large-scale mosaicking for massive remote sensed imageries withdynamic DAG scheduling. Parallel and Distributed Systems, IEEE Transactions on, 25(8): 2126–2137, Aug2014.

Lizhe Wang HPC, Clouds, RS Big Data

Page 48: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Large scale parallel RS Image Mosaicing

Data: More than Half of China. (20◦ N–47◦ N and 84◦ E–180◦ E) 352 Scenesof ETM+, 15m resolution, PANPerformance: On a 10-node cluster, it takes 5 hours. typically it is several-daywork with a commercial software

Lizhe Wang HPC, Clouds, RS Big Data

Page 49: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

ToC

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 50: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Multi-datacenter Clouds for RS Data Processing

Lizhe Wang HPC, Clouds, RS Big Data

Page 51: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Multi-datacenter Clouds for RS Data Processing

Products provide40+ types of RS inversion products: vegetation structure andgrowth state, radiation budget, water and heat fluxes, ice-snowvariation, mineral exploration20+ types of thematic RS products: agriculture, forestry,geology, water,sea ice, environmentlarge scales area: global, China, South Americalong time series: all year round

Service providespatial metadata management for multi-level RS data;data-dependent knowledge base;replicas management for distributed RS data;automated data production;production task scheduling;workflow fault-tolerant;distributed resource monitoring

Lizhe Wang, etc. pipsCloud: High Performance Cloud Computing for Remote Sensing Big DataManagement and Processing. Submitted to FGCS.

Lizhe Wang HPC, Clouds, RS Big Data

Page 52: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Software System Architecture

�,+'1,/'+%

��1����+�%#*#+1��2 0!/'-1',+����&�/'+%�

�������'/12�)��/,!#00'+%��601#*�$,/����

� �����'/12�)������)201#/� +3'/,+*#+1�$,/����

�6-#/3'0,/

�),2"��/�*#4,/(���-#+�1�!(�

�/#��#1��5���

�'/12�)���!&'+#�����

�� �� ������ �� �� ��

�#1�,/(�1,/�%# �,*-21#�����1�

�),2"��,/1�)�������������������4# �0#/3'!#����������

��������--)'!�1',+��4�/#�

"�1��)�6,210��+"�!,-6�

������-�#"2!#

�/,%/�**'+%��(#)#1,+0���������

�'01/' 21#"������1���1/2!12/#

��0(��!&#"2)'+%�������,/.2#����

����#-),6#/�+,3��5���

����!&#"2)�/�+,3��0!&#"2)�/�

�*�%#�#-,0'1,/6��)�+!#�

�2,1���,3��.2,1��

���������� ����������

�),2"���+�%#*#+1

�1,/�%#

��'+"#/������4'$1�

�,/($),4��201,*'7�1',+

�,+'1,/'+%���+%)'��

�����1����+�%#*#+1

�������

�#1�"�1����+�%#*#+1

�����1���+"#5'+%

�����+"#5'+%��/##�

�#1�"�1���+"#5'+%�),,*��')1#/�

�#1�"�1���#/3'!#��1��

�2 0!/'-1',+

��

��������������������

�#-)#/

�/"#/���+�%#*#+1

�,/($),4��/,!#00'+%

�601#*���+�%#*#+1

�'/12�)��/%�+'7�1',+�����+�%#*#+1 ����2#2#

����,$14�/#�#-,0'1,/6

������������������0#�

��-��#/3'!#���-0#/3#/�

��1���#/3'!#

��1���#1/'#3�)

��1���,4+),�"

�#1�"�1���#1/'#3�)

�#1�"�1���,4+),�"

�2)1'�1#+�+!6!!,2+1'+% �1�1'01'!

�#0,2/!#��,+'1,/'+%�

���%',0��601#*

��,+'1,/'+%�

���+%)'��

�,%�+�)'060��+"��,+'1,/'+%�

��-)2+(�

�#!2/'16��#6�+,1#�

�0#/�21&#+1'!!�1',+

Lizhe Wang HPC, Clouds, RS Big Data

Page 53: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS file management

Lizhe Wang HPC, Clouds, RS Big Data

Page 54: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS Metadata management

��$���$���

��������$

�&�$��� ���

� �����$�"�����'����

������������

Storage

��� ����� ���� ����

��

� � �

� �

� �

� � �� � ��

������

� � �

� �

� �

� � �� � ��

������

� � �

� �

� �

� � �� � ��

������

Storage Storage Storage

������������

�" %!����

����$"(�����#�" %!���

����$"(�����#�" %!���

����$"(�����#

��� �������������� ����

��� ��� ���

�'�� �'�� )�� �'�� �'����$���$���%�"(

Lizhe Wang HPC, Clouds, RS Big Data

Page 55: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS Workflow Management

Lizhe Wang HPC, Clouds, RS Big Data

Page 56: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS Replica and Cache Management

Lizhe Wang HPC, Clouds, RS Big Data

Page 57: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

RS Replica and Cache Management

Lizhe Wang HPC, Clouds, RS Big Data

Page 58: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

ToC

1 RS Big Data: an Overview

2 How Big? a Quantitative Model

3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing

4 RS Big Data Processing in Clouds: Current Work

5 Concluding Remark

Lizhe Wang HPC, Clouds, RS Big Data

Page 59: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Concluding Remark for RS Big Data

FindingsBig DataCompute-intensive and data-intensiveUnbalanced Data FlowComplex Algorithm Flow

RecommendationSpecially optimized HPC software systems for data transfer,IO, storage, processing and management

Lizhe Wang HPC, Clouds, RS Big Data

Page 60: High Performance Computing, Cloud computing, and Remote ...ceos.org/document_management/Working_Groups/WGISS... · High Performance Computing, Cloud computing, and Remote Sensing

Thank you for your time !

Lizhe Wang HPC, Clouds, RS Big Data