feedback on big compute & hpc on windows azure

35

Upload: antoine-poliakov

Post on 26-Jan-2015

106 views

Category:

Technology


1 download

DESCRIPTION

Is the cloud relevant for high performance workloads ? We answer by sharing our experience : HPC consultants at ANEO have ported and optimized a distributed scientific software developed at Supelec, from their Linux cluster to Microsoft's new cloud technology, Big Compute (InfiniBand nodes interconnect).

TRANSCRIPT

Page 1: Feedback on Big Compute & HPC on Windows Azure
Page 2: Feedback on Big Compute & HPC on Windows Azure

Innovation Recherche

Feedback onBig Compute & HPC

on Windows AzureAntoine PoliakovHPC Consultant

ANEO

[email protected]://blog.aneo.eu

Page 3: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#3

• Cloud : on-demand access through a telecommunications network to shared and user-configurable IT resources

• HPC (High Performance Computing) : a branch of computer science conercned with maximizing software efficiency, in particular in terms of execution speed

– Raw computing power doubles every 1.5 - 2 years– Network throughput doubles every 2 - 3 years– The compute/network gap doubles every 5 years

• HPC in the cloud allows makes computing power accessible to all (SME, research labs, etc.) Fosters innovation

• Our question : can the cloud offer sufficient performances for HPC workloads ?– CPU : 100% native speed– RAM: 99% native speed– Network ???

HPC : a challenge for the cloudIntroduction

Page 4: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#4

State of the art of HPC in the cloud

Experiments

Technology

HPC oriented

cloud

Use-caseHPC

software

3 ingredients yield an answer through experimentation

Introduction

Page 5: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#5

Identify technologies and partners

HPC software use-caseEfficient cloud computing service

Experiment and measure performances• S

caling

• Data transfers

Experimenting on HPC in the cloud : our approach

Introduction

Page 6: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#6

A collaborative project with 3 complementary actors

Introduction

Established HPC research teams:· Distributed software & big

data· Machine learning and

interactive systems

Goals· Is the cloud ready for

scientific computing ?· Specificities of deploying in

the cloud ?· Performances

Windows Azure provides a cloud solution aimed at HPC workloads:Azure Big Compute

Goals· Pre-release feedback· Inside view of a HPC

cluster cloud transition

Consulting firm: organization and technologies HPC Practice: fast/massive information processing for finance and industries

Goals· Identify most relevent use-

cases for our clients· Estimate the complexity of

porting and deploying an app· Evaluate if the solution is

production-ready

Page 7: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#7

Dedicated and competant teams: thank you all!Introduction

Research· Use-case: distributed audio

segmentation· Experiments analysis

Provider· Created the technical solution· Made available notable

computational power

Consulting· Ported and deployed the

application in the cloud· Led the benchmarks

Constantinos MakassikisHPC Consultant

Wilfried KirschenmannHPC Consultant

Antoine PoliakovHPC Consultant

Stéphane RossignolAssistant Professor,Signal processing

Stéphane VialleProfessor, Computer science

Xavier PillonsPrincipal Program Manager,Windows Azure CAT

Kévin DehlingerComputer scientist internCNAM

Page 8: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#8

1. Technical context

2. Feedback on porting the application

3. Optimizations

4. Results

Presentation contents

Page 9: Feedback on Big Compute & HPC on Windows Azure

Innovation Recherche#mstechdays #9

1. TECHNICAL CONTEXT

a. Azure Big Computeb. ParSon

Page 10: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#10

Azure Big Compute = New Azure nodes + HPC Pack

New nodes: A8 and A9

• 2x8 snb E5-2670 @2.6Ghz, 112Gb DDR3 @1.6Ghz• InfiniBand (network direct @40Gbit/s): RDMA via MS-MPI @3.5Gb/s, 3µs• IP over Ethernet @10Gbit/s ; HDD 2Tb @250Mo/s• Azure hypervisor

HPC Pack

• Task scheduler middleware: Cluster Manager + SDK• Tested with 50k cores in Azure• Free Extension Pack : any Windows Server install can be a node

Azure Big Compute

Page 11: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#11

HPC Pack : on permise clusterAzure Big Compute

• Active Directory, Manager and nodesin a privately managed infrastructure

• Cluster dimensioned w.r.t. maximal workload

• Administration : hardware + software

AD

M

N N

N N

N N

N N

N N N N

Page 12: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#12

HPC Pack : in the Azure Big Compute cloud

• Active Directory and manager in the cloud (VMs)

• Nodes allocation and pricing on demand

• Admin : software only

Azure Big Compute

AD

M

N N

N N

N N

N N

N N N NRemote

desktop/CLI

PaaS nodes

IaaS VM

Page 13: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#13

HPC Pack : hybrid deploymentAzure Big Compute

• Active Directory and manager on premise

• Nodes both in the datacenter and in the cloud

• Local dimensioning w.r.t. average loadDynamic cloud dimensioning: absorbs peaks

• Admin: software + hardware

AD

M

N N

N N

N N

N N

N N N N

N N

N N

N N

N N

N N N N

VPN

Page 14: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#14

• ParSon = audio segmentation algorithm : voice / music

1.Supervised training on known audio samples to calibrate the classifier

2.Classification based on spectral analysis (FFT) on sliding windows

ParSon: an audio segmentation scientific software

ParSon

ParSon

Segmentation and classification

Digital audio

voice

music

Page 15: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#15

ParSon is distributed with OpenMP + MPIParSon

1. Upload input files

OAR

2. Reserves N computers

4. MPI Exec

6. Get outputs

NAS Reserved computersLinux

cluster

3. Input deployment

5. Tasks with heavy inter-

communications

Data

Control

Page 16: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#16

Performances are limited by data transfersParSon

1 10 100 10008

80

800

8000

en réseau, à froiden local, à froid

Number of nodes

Best

runti

me

(s)

IO bound

Nodes read from NASNodes read locally

Page 17: Feedback on Big Compute & HPC on Windows Azure

Innovation Recherche#mstechdays #17

2. PORTING THE APPLICATION

a. Porting C++ code: Linux Windows

b. Porting distribution strategy: Cluster HPC Cluster Manager

c. Porting and adapting deployment scripts

Page 18: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#18

• ParSon and Visual conform to the C++ standard few code changes

• Dependencies are the standard libraries and cross-platform scientific libraries : libsnd, fftw

• Thanks to MS-MPI, inter-process communication code doesn’t change

• Visual Studio natively supports OpenMP

• The only task left was translating build files:Makefiles Visual C++ projects

Standards conformance = easy Linux Windows porting

Porting

Page 19: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#19

ParSon in the clusterPorting

1. Upload input file

OAR

2. Reserves N computers

4. MPI Exec

6. Get output

NAS Reserved computersLinux

cluster

3. Input deployment

5. Run and inter-com.

Data

Control

Page 20: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#20

HPC pack SDK

ParSon dans le Cloud AzurePorting

1. Upload input file

HPC Cluster Manager

2. Reserves N nodes

4. MPI Exec

6. Get output

Azure Storage Provisioned A9 nodes PaaS Big Compute

3. Input deployment

AD Domain controll

er

IaaS PaaSData

Control

5. Run and inter-com.

Page 21: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#21

At every software update : package + send in the cloud1. Send to manager

– Either with Azure StorageSet-AzureStorageBlobContent Get-AzureStorageBlobContenthpcpack create ; hpcpack upload hpcpack download

– Or with normal transfert : internet accessible fileserver : FileZilla, etc.

2. Packaging script: mkdir, copy, etc. ; hpcpack create3. Send to Azure storage: hpcpack upload

At every node provisioning : local copy4. Remotely execute on nodes from the manager with clusrun5.hpcpack download6.powershell -command "Set-ExecutionPolicy RemoteSigned"

Invoke-Command -FilePath … -Credential …Start-Process powershell -Verb runAs -ArgumentList …

7. Installation : %deployedPath%\deployScript.ps1

Deployment within AzurePorting

Page 22: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#22

• Transferring the input file is longer than sequential computation on

a single thread

• On many cores, computation times is negligible compared to

transfers

• WAV format headers and ParSon code limit input size to 4Gb

This first working setup has some limitationsPorting

Page 23: Feedback on Big Compute & HPC on Windows Azure

Innovation Recherche#mstechdays #23

3. OPTIMIZATIONS

Page 24: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#24

Identified bottleneck is the input file transfer

1. Disk write throughput: 300 Mb/s

We use a RAMFS

2. Accès Azure Storage : QoS 1.6 Gb/s

Download only once from the storage account, then broadcast

through InfiniBand

3. Large input files: 60 Gb

FLAC c8 lossless compression halves size + not limited to 4Gb

Declare all counters as 64 bits ints in C++ code

Methodology : suppress the bottleneckOptimizations

Page 25: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#25

• RAMFS = filesystem stored in a RAM block– Very fast– Limited capacity, non persistent

• ImDisk– Lightweight: driver + service + command line– Open-source but signed for Win64

• Scripted silent install :– hpcpack create …– rundll32 setupapi.dll,InstallHinfSection DefaultInstall 128 disk.inf

Start-Service -inputobject $(get-service -Name imdisk)– imdisk.exe -a -t vm -s 30G -m F: -o rw

format F: /fs:ntfs /x /q /Y– $acl = Get-Acl F:

$acl.AddAccessRule(…FileSystemAccessRule("Everyone","Write", …))Set-Acl F: $acl

• Run at every node provisioning

Accelerating local data access with a RAM filesystem

Optimizations

Page 26: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#26

• All standard transfer systems go through the Ethernet interface– Azure Storage access via Azure and HPC Pack SDKs

– Windows share or CIFS network drive

– Standard file transfer protocols: FTP, NFS, etc.

• The simplest way to leverage InfiniBand is through MPI1. On one node: download the input file: Azure RAMFS

2. mpiexec broadcast.exe : 1 process per node• We developped a command line utility in C++ / MPI

• If id = 0, reads RAMFS, by 4mb blocs and sends to other nodes through InfiniBand : MPI_Bcast

• If id ≠ 0, recieve data blocs and save them on RAMFS

• Uses Win32 API: faster than standard library abstractions

3. Input data is in the RAM of all nodes, accessible as a file from the application

Accelerating input file deploymentOptimizations

Page 27: Feedback on Big Compute & HPC on Windows Azure

Innovation Recherche#mstechdays #27

4. RESULTS

Page 28: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#28

Computations scale well, especially for bigger files

Results

Number of cores (log) Number of cores (log)

Computation time scaling (log-log plot) Computation efficiency for different input sizes

Real sp

eed

up

/ id

eal sp

eed

up

Com

pu

tati

on

tim

e (

sec,

log

)

Page 29: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#29

Input file transfer make global scaling worseResults

+-

Number of cores (log)

Efficiency for compute only and including transfers

Raw compute

Number of cores (log)R

eal sp

eed

up

/ id

eal sp

eed

up

Tim

e (

sec,

log

)

Time decomposition, for an hour of input audio

Page 30: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#30

Consistent storage throughput (220Mb/s), latency may be high

Broadcast constant @700 Mb/s

Results

Number of machinesFile size (Gb)

Asure storage download performances Broadcast time scaling

Bro

ad

cast

tim

e (

sec,

log

)

D

ow

nlo

ad

tim

e (

min

)

Page 31: Feedback on Big Compute & HPC on Windows Azure

Innovation Recherche#mstechdays #31

5. CONCLUSION

Page 32: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#32

Our feedback on the Big Compute technology

• HPC standards conformance: C++, OpenMP, MPI

– Ported in 10 work days

• Solid performances– Compute: CPU, RAM– Network: InfiniBand between nodes

• Reactive support– Community, Microsoft

• Intuitive user interface– manage.windowsazure.com– HPC Cluster Manager

• Everything is scriptable & programmable

• Cloud is more flexible than cluster

• Unified management of cloud and on-premise

• Data transfers– Azure storage latency sometimes high– Azure storage limited QoS users must

implement multiple account striping– HDDs are slow (for HPC), even on A9

• Nodes administration– Nodes ↔ Manager transfers must go

through Azure storage: less convenient than conventional remote file systems

• Provisioning time must be taken into account (~7min)

Page 33: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#33

Azure Big Compute for research and business

• Access to compute without any barrierpaperwork, finance, etc.

• Start your workload in minutes

– For squeezing a few more before the (extended) deadline for that conference

• Well suited to researchers in distributed computing

– Parametric experiments

• A super computer for all, without investment

• Elastic scaling : on-demand sizing

• Interoperable with Windows clusters– Cloud absorbs peaks– Best of both worlds

• Datacenters in UE : Ireland + Netherlands

Predictable, pay what you use cost model

Modern design, extensive documentation, efficient support

Decreased need for administration – but still needed on the software side

For research For business

Page 34: Feedback on Big Compute & HPC on Windows Azure

#mstechdays Innovation Recherche#34

Thanks

?

Thank you for your attention• Antoine Poliakov

[email protected]

• Stéphane [email protected]

• ANEOhttp://aneo.euhttp://blog.aneo.eu

• Retrouvez nous aux TechDays !Stand ANEO jeudi 11h30 - 13hAu cœur du SI > Infrastructure moderne avec Azure

All our thanks to Microsoft for lending us the nodes

A question : don’t hesitate!

Page 35: Feedback on Big Compute & HPC on Windows Azure

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Digital is business