internet of things (smart grid) storm archival storage – nosql like hbase streaming processing...

17
Internet of Things (Smart Grid) Stor m Stor m Stor m Stor m Stor m Stor m Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce) Raw Data Informatio n Wisdom Knowledg e Data Decision s Analytics Analytics Pub-Sub System Orchestration / Dataflow / Workflow Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data

Upload: beatrice-ross

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

Internet of Things (Smart Grid)

Storm Storm Storm Storm Storm Storm

Archival Storage – NOSQL like Hbase

Streaming Processing (Iterative MapReduce)

Batch Processing (Iterative MapReduce)

Raw Data

Information WisdomKnowledgeData Decisions

Analytics

Analytics

Pub-Sub

System Orchestration / Dataflow / Workflow

Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data

Page 2: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

Data Ingest

Storm Storm Storm Storm Storm Storm

Archival Storage – Accumulo

Streaming Processing (Bolts)

Batch Processing (MapReduce)

Raw Data

Information WisdomKnowledgeData Decisions

Analytics

Analytics

Pub-Sub

System Orchestration / Dataflow / Workflow

Page 3: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

Integrated Software

Ecosystem

Crunch, Tez, Cloud Dataflow Kepler, Pegasus

Mllib/Mahout, R, Python Matlab, Eclipse, Apps

Pig, Hive, Drill Domain-specific Languages

App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack

Java, Erlang, SQL, SparQL Fortran, C/C++

Storm, Kafka, KinesisMapReduce MPI/OpenMP/OpenCL

ZookeeperMemcached

Hbase, Neo4J iRODSSqoop GridFTP

Yarn Slurm

HDFS Lustre

Thrift, Protobuf FITS, HDF

Big Data HPCOrchestration

Libraries

High Level Programming

Platform as a Service

Languages

StreamingParallel Runtime

CoordinationCaching

Data ManagementData Transfer

Scheduling

File Systems

Formats

Page 4: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

HPC-ABDS IntegratedSoftware

Big Data ABDS HPC, Cluster

Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus

Libraries Mllib/Mahout, R, Python Matlab, Eclipse, Apps

High Level Programming Pig, Hive, Drill Domain-specific Languages

Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack

Languages Java, Erlang, SQL, SparQL Fortran, C/C++

Streaming Storm, Kafka, KinesisParallel Runtime MapReduce MPI/OpenMP/OpenCL

Coordination ZookeeperCaching Memcached

Data Management Hbase, Neo4J, MySQL iRODSData Transfer Sqoop GridFTP

SchedulingYarn Slurm

File Systems HDFS, Object Stores Lustre

Formats Thrift, Protobuf FITS, HDF

Virtualization OpenStack Docker, SR-IOV

Infrastructure CLOUDS SUPERCOMPUTERS

Page 5: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

HPC-ABDS IntegratedSoftware

Big Data ABDS HPC, Cluster

17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna

16. Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab

15A. High Level Programming Pig, Hive, Drill Domain-specific Languages

15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack

Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python

14B. Streaming Storm, Kafka, Kinesis13,14A. Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL

2. Coordination Zookeeper12. Caching Memcached

11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS10. Data Transfer Sqoop GridFTP

9. Scheduling Yarn Slurm

8. File Systems HDFS, Object Stores Lustre

1, 11A Formats Thrift, Protobuf FITS, HDF

5. IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV

Infrastructure CLOUDS SUPERCOMPUTERS

CUDA, Exascale Runtime

Page 6: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

InitialConvergence

Software

Big Data ABDS HPC, Cluster

Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna

Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab

High Level Programming Pig, Hive, Drill Domain-specific Languages

Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack

Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python

Streaming Storm, Kafka, KinesisParallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL

Coordination ZookeeperCaching Memcached

Data Management Hbase, Accumulo, Neo4J, MySQL iRODSData Transfer Sqoop GridFTP

SchedulingMesos, Aurora, Yarn Slurm

File Systems HDFS, Object Stores Lustre

Formats Thrift, Protobuf FITS, HDF

IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV

Infrastructure CLOUDS SUPERCOMPUTERS

CUDA, Exascale Runtime

Page 7: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

HPC-ABDS IntegratedSoftware

Big Data ABDS HPC, Cluster

17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna

16. Libraries Mllib/Mahout, R, Python ScaLAPACK, PETSc, Matlab

15A. High Level Programming Pig, Hive, Drill Domain-specific Languages

15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack

Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python

14B. Streaming Storm, Kafka, Kinesis13,14A. Parallel Runtime MapReduce MPI/OpenMP/OpenCL

2. Coordination Zookeeper12. Caching Memcached

11. Data Management Hbase, Neo4J, MySQL iRODS10. Data Transfer Sqoop GridFTP

9. Scheduling Yarn Slurm

8. File Systems HDFS, Object Stores Lustre

1, 11A Formats Thrift, Protobuf FITS, HDF

5. IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV

Infrastructure CLOUDS SUPERCOMPUTERS

CUDA, Exascale Runtime

Page 8: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

HPC-ABDS IntegratedSoftware

Big Data ABDS HPC, Cluster

17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna

16. Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab

15A. High Level Programming Pig, Hive, Drill Domain-specific Languages

15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack

Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python

14B. Streaming Storm, Kafka, Kinesis13,14A. Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL

2. Coordination Zookeeper12. Caching Memcached

11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS10. Data Transfer Sqoop GridFTP

9. Scheduling Yarn Slurm

8. File Systems HDFS, Object Stores Lustre

1, 11A Formats Thrift, Protobuf FITS, HDF

5. IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV

Infrastructure CLOUDS SUPERCOMPUTERS

CUDA, Exascale Runtime

Page 9: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

InitialConvergence

Software

Big Data ABDS HPC, Cluster

Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna

Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab

High Level Programming Pig, Hive, Drill Domain-specific Languages

Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack

Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python

Streaming Storm, Kafka, KinesisParallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL

Coordination ZookeeperCaching Memcached

Data Management Hbase, Accumulo, Neo4J, MySQL iRODSData Transfer Sqoop GridFTP

Scheduling Mesos, Aurora, Yarn Slurm

File Systems HDFS, Object Stores Lustre

Formats Thrift, Protobuf FITS, HDF

IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV

Infrastructure CLOUDS SUPERCOMPUTERS

CUDA, Exascale Runtime

Page 10: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

4 Forms of MapReduce

 

(1) Map Only(4) Point to Point or

Map-Communication

(3) Iterative Map Reduce or Map-Collective

(2) Classic MapReduce

   

Input

    

map   

      

reduce

 

Input

    

map

   

      reduce

IterationsInput

Output

map

    Local

Graph

BLAST AnalysisLocal Machine LearningPleasingly Parallel

High Energy Physics (HEP) HistogramsDistributed searchRecommender Engines

Expectation maximization Clustering e.g. K-meansLinear Algebra, PageRank

Classic MPIPDE Solvers and Particle DynamicsGraph Problems

MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph

Integrated Systems such as Hadoop + Harp with Compute and Communication model separated

Correspond to first 4 of Identified Architectures

Page 11: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

(5) Map Streaming

 

maps

 

 

 

 

brokers

Events

 

 

(6) Shared memory Map Communicates

 Map &

Communicate

 Shared Memory

          

Page 12: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

6 Data Analysis Architectures

(1) Map Only(4) Point to Point or

Map-Communication

(3) Iterative Map Reduce or Map-Collective

(2) Classic MapReduce

Input

map

reduce

Input

map

reduce

IterationsInput

Output

map

Local

Graph

(5) Map Streaming

maps brokers

Events

(6) Shared memory Map Communicates

Map & Communicate

Shared Memory

BLAST AnalysisLocal Machine LearningPleasingly Parallel

High Energy Physics (HEP) HistogramsWeb searchRecommender Engines

Expectation maximization Clustering Linear Algebra, PageRank

Classic MPIPDE Solvers and Particle DynamicsGraph

Streaming images from Synchrotron sources, Telescopes, IoT

MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Apache Storm

Difficult to parallelize asynchronousparallel Graph Algorithms

Harp – Enhanced Hadoop Maps are Bolts

Classic Hadoop in classes 1) 2)

Page 13: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

(5) Map Streaming

maps brokers

Events

(6) Shared memory Map Communicates

Map & Communicate

Shared Memory

Page 14: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)
Page 15: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

Kmeans ClusteringTimeSecs

Efficiency

# Cores

Page 16: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)
Page 17: Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

Infrastructure

IaaS

Software Defined Computing (virtual Clusters)

Hypervisor, Bare Metal Operating System

Platform

PaaS

Cloud e.g. MapReduce HPC e.g. PETSc, SAGA Computer Science e.g.

Compiler tools, Sensor nets, Monitors

Software-Defined Distributed System (SDDS) as a Service includes

Network

NaaS Software Defined

Networks OpenFlow GENI

Software(ApplicationOr Usage)

SaaS

Use HPC-ABDS Class Usages e.g. run

GPU & multicore Applications Control Robot

SDDS-aaS Tools Provisioning Image Management IaaS Interoperability NaaS, IaaS tools Expt management Dynamic IaaS NaaS DevOps

CloudMesh is a SDDSaaS tool that uses Dynamic Provisioning and Image Management to provide custom environments for general target systemsInvolves (1) creating, (2) deploying, and (3) provisioning of one or more images in a set of machines on demand http://mycloudmesh.org/

17

Dynamic Orchestration and Dataflow