internet of things (smart grid) storm archival storage – nosql like hbase streaming processing...
TRANSCRIPT
Internet of Things (Smart Grid)
Storm Storm Storm Storm Storm Storm
Archival Storage – NOSQL like Hbase
Streaming Processing (Iterative MapReduce)
Batch Processing (Iterative MapReduce)
Raw Data
Information WisdomKnowledgeData Decisions
Analytics
Analytics
Pub-Sub
System Orchestration / Dataflow / Workflow
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Data Ingest
Storm Storm Storm Storm Storm Storm
Archival Storage – Accumulo
Streaming Processing (Bolts)
Batch Processing (MapReduce)
Raw Data
Information WisdomKnowledgeData Decisions
Analytics
Analytics
Pub-Sub
System Orchestration / Dataflow / Workflow
Integrated Software
Ecosystem
Crunch, Tez, Cloud Dataflow Kepler, Pegasus
Mllib/Mahout, R, Python Matlab, Eclipse, Apps
Pig, Hive, Drill Domain-specific Languages
App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack
Java, Erlang, SQL, SparQL Fortran, C/C++
Storm, Kafka, KinesisMapReduce MPI/OpenMP/OpenCL
ZookeeperMemcached
Hbase, Neo4J iRODSSqoop GridFTP
Yarn Slurm
HDFS Lustre
Thrift, Protobuf FITS, HDF
Big Data HPCOrchestration
Libraries
High Level Programming
Platform as a Service
Languages
StreamingParallel Runtime
CoordinationCaching
Data ManagementData Transfer
Scheduling
File Systems
Formats
HPC-ABDS IntegratedSoftware
Big Data ABDS HPC, Cluster
Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus
Libraries Mllib/Mahout, R, Python Matlab, Eclipse, Apps
High Level Programming Pig, Hive, Drill Domain-specific Languages
Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack
Languages Java, Erlang, SQL, SparQL Fortran, C/C++
Streaming Storm, Kafka, KinesisParallel Runtime MapReduce MPI/OpenMP/OpenCL
Coordination ZookeeperCaching Memcached
Data Management Hbase, Neo4J, MySQL iRODSData Transfer Sqoop GridFTP
SchedulingYarn Slurm
File Systems HDFS, Object Stores Lustre
Formats Thrift, Protobuf FITS, HDF
Virtualization OpenStack Docker, SR-IOV
Infrastructure CLOUDS SUPERCOMPUTERS
HPC-ABDS IntegratedSoftware
Big Data ABDS HPC, Cluster
17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna
16. Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab
15A. High Level Programming Pig, Hive, Drill Domain-specific Languages
15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack
Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python
14B. Streaming Storm, Kafka, Kinesis13,14A. Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL
2. Coordination Zookeeper12. Caching Memcached
11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS10. Data Transfer Sqoop GridFTP
9. Scheduling Yarn Slurm
8. File Systems HDFS, Object Stores Lustre
1, 11A Formats Thrift, Protobuf FITS, HDF
5. IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV
Infrastructure CLOUDS SUPERCOMPUTERS
CUDA, Exascale Runtime
InitialConvergence
Software
Big Data ABDS HPC, Cluster
Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna
Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab
High Level Programming Pig, Hive, Drill Domain-specific Languages
Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack
Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python
Streaming Storm, Kafka, KinesisParallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL
Coordination ZookeeperCaching Memcached
Data Management Hbase, Accumulo, Neo4J, MySQL iRODSData Transfer Sqoop GridFTP
SchedulingMesos, Aurora, Yarn Slurm
File Systems HDFS, Object Stores Lustre
Formats Thrift, Protobuf FITS, HDF
IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV
Infrastructure CLOUDS SUPERCOMPUTERS
CUDA, Exascale Runtime
HPC-ABDS IntegratedSoftware
Big Data ABDS HPC, Cluster
17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna
16. Libraries Mllib/Mahout, R, Python ScaLAPACK, PETSc, Matlab
15A. High Level Programming Pig, Hive, Drill Domain-specific Languages
15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack
Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python
14B. Streaming Storm, Kafka, Kinesis13,14A. Parallel Runtime MapReduce MPI/OpenMP/OpenCL
2. Coordination Zookeeper12. Caching Memcached
11. Data Management Hbase, Neo4J, MySQL iRODS10. Data Transfer Sqoop GridFTP
9. Scheduling Yarn Slurm
8. File Systems HDFS, Object Stores Lustre
1, 11A Formats Thrift, Protobuf FITS, HDF
5. IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV
Infrastructure CLOUDS SUPERCOMPUTERS
CUDA, Exascale Runtime
HPC-ABDS IntegratedSoftware
Big Data ABDS HPC, Cluster
17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna
16. Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab
15A. High Level Programming Pig, Hive, Drill Domain-specific Languages
15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack
Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python
14B. Streaming Storm, Kafka, Kinesis13,14A. Parallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL
2. Coordination Zookeeper12. Caching Memcached
11. Data Management Hbase, Accumulo, Neo4J, MySQL iRODS10. Data Transfer Sqoop GridFTP
9. Scheduling Yarn Slurm
8. File Systems HDFS, Object Stores Lustre
1, 11A Formats Thrift, Protobuf FITS, HDF
5. IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV
Infrastructure CLOUDS SUPERCOMPUTERS
CUDA, Exascale Runtime
InitialConvergence
Software
Big Data ABDS HPC, Cluster
Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna
Libraries MLlib/Mahout, R, Python ScaLAPACK, PETSc, Matlab
High Level Programming Pig, Hive, Drill Domain-specific Languages
Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack
Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python
Streaming Storm, Kafka, KinesisParallel Runtime Hadoop, MapReduce MPI/OpenMP/OpenCL
Coordination ZookeeperCaching Memcached
Data Management Hbase, Accumulo, Neo4J, MySQL iRODSData Transfer Sqoop GridFTP
Scheduling Mesos, Aurora, Yarn Slurm
File Systems HDFS, Object Stores Lustre
Formats Thrift, Protobuf FITS, HDF
IaaS OpenStack, Docker Linux, Bare-metal, SR-IOV
Infrastructure CLOUDS SUPERCOMPUTERS
CUDA, Exascale Runtime
4 Forms of MapReduce
(1) Map Only(4) Point to Point or
Map-Communication
(3) Iterative Map Reduce or Map-Collective
(2) Classic MapReduce
Input
map
reduce
Input
map
reduce
IterationsInput
Output
map
Local
Graph
BLAST AnalysisLocal Machine LearningPleasingly Parallel
High Energy Physics (HEP) HistogramsDistributed searchRecommender Engines
Expectation maximization Clustering e.g. K-meansLinear Algebra, PageRank
Classic MPIPDE Solvers and Particle DynamicsGraph Problems
MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph
Integrated Systems such as Hadoop + Harp with Compute and Communication model separated
Correspond to first 4 of Identified Architectures
(5) Map Streaming
maps
brokers
Events
(6) Shared memory Map Communicates
Map &
Communicate
Shared Memory
6 Data Analysis Architectures
(1) Map Only(4) Point to Point or
Map-Communication
(3) Iterative Map Reduce or Map-Collective
(2) Classic MapReduce
Input
map
reduce
Input
map
reduce
IterationsInput
Output
map
Local
Graph
(5) Map Streaming
maps brokers
Events
(6) Shared memory Map Communicates
Map & Communicate
Shared Memory
BLAST AnalysisLocal Machine LearningPleasingly Parallel
High Energy Physics (HEP) HistogramsWeb searchRecommender Engines
Expectation maximization Clustering Linear Algebra, PageRank
Classic MPIPDE Solvers and Particle DynamicsGraph
Streaming images from Synchrotron sources, Telescopes, IoT
MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Apache Storm
Difficult to parallelize asynchronousparallel Graph Algorithms
Harp – Enhanced Hadoop Maps are Bolts
Classic Hadoop in classes 1) 2)
(5) Map Streaming
maps brokers
Events
(6) Shared memory Map Communicates
Map & Communicate
Shared Memory
Kmeans ClusteringTimeSecs
Efficiency
# Cores
Infrastructure
IaaS
Software Defined Computing (virtual Clusters)
Hypervisor, Bare Metal Operating System
Platform
PaaS
Cloud e.g. MapReduce HPC e.g. PETSc, SAGA Computer Science e.g.
Compiler tools, Sensor nets, Monitors
Software-Defined Distributed System (SDDS) as a Service includes
Network
NaaS Software Defined
Networks OpenFlow GENI
Software(ApplicationOr Usage)
SaaS
Use HPC-ABDS Class Usages e.g. run
GPU & multicore Applications Control Robot
SDDS-aaS Tools Provisioning Image Management IaaS Interoperability NaaS, IaaS tools Expt management Dynamic IaaS NaaS DevOps
CloudMesh is a SDDSaaS tool that uses Dynamic Provisioning and Image Management to provide custom environments for general target systemsInvolves (1) creating, (2) deploying, and (3) provisioning of one or more images in a set of machines on demand http://mycloudmesh.org/
17
Dynamic Orchestration and Dataflow