hpc software cluster solution

27
LENOVO System Management Solutions 2015 Lenovo All rights reserved. Luigi Brochard, Lenovo HPC Distinguished Engineer HPC Advisory Council 2016, Lugano April 21-23.

Upload: lykien

Post on 13-Feb-2017

236 views

Category:

Documents


5 download

TRANSCRIPT

LENOVOSystem Management Solutions

2015 Lenovo All rights reserved.

Luigi Brochard, Lenovo HPC Distinguished Engineer

HPC Advisory Council 2016, Lugano April 21-23.

2

HPC Software Solutions through Partnerships

2015 Lenovo

• Building Partnerships to provide

the “Best In-Class” HPC Cluster

Solutions for our customers

• Collaborating with software vendors

to provide features that optimizes

customer workloads

• Leveraging “Open Source”

components that are production

ready

• Contributing to “Open Source” (i.e.

xCAT, Confluent, OpenStack) to

enhance our platforms

• Providing “Services” to help

customers deploy and optimize their

clusters

Customer Applications

Compute Storage Network

OFED

UFM

LenovoSystem x

Virtual, Physical, Desktop, Server

OS

VM

Systems

ManagementIBM PCM

xCatExtreme Cloud

Admin. Toolkit

Parallel File

SystemsIBM GPFS Lustre NFS

Workload &

Resources

IBM LSFHPC & Symphony

Adaptive

Moab

Maui/Torque

Slurm

Parallel

RuntimeIntel MPI Open MPI

MVAPICH,

IBM PMPI

Compilers &

Tools

Intel Parallel

Studio, MKL

Open Source Tools:

FFTW, PAPI, TAU, ..

Debuggers &

Monitoring

Eclipse PTP +

debugger, gdb,..ICINGA Ganglia

Ente

rprise S

olu

tion S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

OmniPath

3

xCAT

2015 Lenovo

Open Source

Collaboration with IBM

Server Hardware Management

OS Deployment

IP and network service

management

Virtualization Management

CLI

Holistic solution management

Weak GUI

Complex to learn

Lacking structure

Poor enablement for web

development

Good for large clusters, difficult for

smaller solutions/enterprise

networks

4

WEB ORCHESTRATIONInitial GOALs

Provide easy cluster access to new HPC customers using Open Source HPC

Infrastructure Low cost entry into HPC

Visual summary views to help understand cluster usage Admin Console – User management, Cluster Monitoring

User Console – Jobs submission, Job/Cluster Monitoring

Initial target and Proof of Concept trials – China Market Focus on China Market first – A lot of customers are just coming into HPC workloads

Collaborating with customers to understand their usage models and future requirements

Very positive feedback and market acceptance

LiCO – Lenovo Intelligent Computing Orchestration was released to China market

WW Market – Create English version and work with collaborators to release

the English version as “Open Source” project : OSMWC Oxford University collaboration

2015 Lenovo

5

Lenovo Intelligent Cluster Orchestrator (LiCO)

What is Web Console:

An Unified GUI

• User Portal (dashboard, submit job, monitor job)

• Admin Portal (dashboard, user/account management)

Future Work Items:

• SLURM integration

• ICINGA integration

• Intel OPA integration

• LDAP integration

Lenovo components Open Source/3rd party Lenovo Hardware

xCAT/Confluent

Torque/MAUIGOLD/Ganglia

WEB CONSOLE GUI

Insta

llatio

n g

uid

e / s

crip

ts

Adm

in g

uid

e / s

crip

ts

OpenMPI, MVAPICH

MPICH, Intel Parallel studio

CentOS/RHEL Lustre OFED

Server Storage Network

Main HPC components below the GUI would be part of OpenHPC project

2015 Lenovo

6

Open System Management Web Console (OSMWC)

What is Web Console:

An Unified GUI

• User Portal (dashboard, submit job, monitor job)

• Admin Portal (dashboard, user/account management)

Future Work Items:

• SLURM integration

• ICINGA integration

• Intel OPA integration

• LDAP integration

Lenovo components Open Source/3rd party Lenovo Hardware

xCAT/Confluent

Torque/MAUIGanglia

WEB CONSOLE GUI

Insta

llatio

n g

uid

e / s

crip

ts

Adm

in g

uid

e / s

crip

ts

OpenMPI, MVAPICH

MPICH, Intel Parallel studio

CentOS/RHEL Lustre OFED

Server Storage Network

Main HPC components below the GUI would be part of OpenHPC project

2015 Lenovo

7

END USER PORTAL – TRANSLATED VIEW

2015 Lenovo

8

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

9

Confluent

2015 Lenovo

10

Confluent Goals

2015 Lenovo

Lenovo led project to improve upon xCAT heritage

Carries on strong CLI and other facets of xCAT

More structured interface

Easier to learn

Web development enabled – RESTful APIs – good GUI possible

Faster performance/lower memory usage/higher scalability for large solutions

Better equipped to work in smaller configurations without full network control

Enhanced security model

Reuse effort across HPC, Openstack, xClarity efforts

Reuse development effort across multiple projects (Lenovo/external

Ecosystem)

More contributions from third parties

11

Confluent updates

• xCAT style noderanges

• Client connections persist across server restart (e.g. consoles)

• xCAT style commands:– nodehealth (new)

– nodesensors (like rvitals)

– nodepower (like rpower)

– nodeeventlog (like reventlog)

– nodeconsole (like rcons)

– nodesetboot (like rsetboot)

– nodeidentify (like rbeacon)

– nodelist (like nodels)

• Inventory in API (nodeinventory to come, similar to rinv)

• Dynamic nodegroups (groups with a ‘noderange’ attribute get expanded)

• Enriched debugging facilities

• Rotating log support (defaults to daily)2015 Lenovo

12

Confluent Web UI (consoles without plugin or java)

2015 Lenovo

13

Confluent CLI – through confetti (RESTful API)

2015 Lenovo

14

nodesensors (csv, and time series data)

2015 Lenovo

15

Confluent performance

2015 Lenovo

16

Future High Performance Computing Open Solutions

2015 Lenovo

• Partnering as founding member

of OpenHPC initiative to

establish a common Open HPC

Framework

• Collaborating with Oxford

University to create an Open

System Management framework

for small to medium clusters

• Leading Open Source system

management projects: Confluent

and soon to be formed OSMWC

• Contributing to xCAT Open Source

project to enhance our platforms

• Providing “Services” to help

customers deploy and optimize

their clusters

Customer Applications

Parallel File

SystemsLenovo

GSS

Intel

LustreNFS

Ente

rprise S

olu

tion S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

Systems

Management

Open System Management

WEB Console (OSMWC)

Confluent

xCatExtreme Cloud

Admin. Toolkit

OS

VMOFED

Compute Storage Network UFM

Leovo System x

Virtual, Physical, Desktop, Server

OmniPath

17

Future High Performance Computing Solutions

2015 Lenovo

• Adding new features• Power & Energy awareness

• Light weight virtual HPC

• Big Data / Spark workload

• Managing more than the servers

Customer Applications

Parallel File

SystemsLenovo

GSS

Intel

LustreNFS

Ente

rprise P

rofe

ssio

nal S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

Open System Management WEB Console (OSMWC)

Integration with

OS

VMOFED

Compute Storage Network UFM

Lenovo System x

Virtual, Physical, Desktop, Server

OmniPath

xCat Extreme Cloud Admin Toolkit, Confluent

18

Future HPC Software Solutions through Partnerships

2015 Lenovo

• Building Partnerships to provide

the “Best In-Class” HPC Cluster

Solutions for our customers

• Collaborating with software vendors

to provide features that optimizes

customer workloads

• Bright Computing

• Altair

• …

Customer Applications

Compute Storage Network

OFED

UFM

LenovoSystem x

Virtual, Physical, Desktop, Server

OS

VM

Systems

ManagementIBM PCM

xCatExtreme Cloud

Admin. Toolkit

Parallel File

SystemsIBM GPFS Lustre NFS

Workload &

Resources

IBM LSFHPC & Symphony

Adaptive

Moab

Maui/Torque/

Slurm/PBSPro

Parallel

RuntimeIntel MPI Open MPI

MVAPICH,

IBM PMPI

Compilers &

Tools

Intel Parallel

Studio, MKL

Open Source Tools:

FFTW, PAPI, TAU, ..

Debuggers &

Monitoring

Eclipse PTP +

debugger, gdb,..ICINGA Ganglia

Ente

rprise S

olu

tion S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

OmniPath

BC CM

20

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

21

User Job Submission views

2015 Lenovo

22

User Job Submission – provide Scheduler job file

2015 Lenovo

23

Admin / Operator views

2015 Lenovo

24

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

25

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

26

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

27

nodehealth

2015 Lenovo