enabling high performance bulk data transfers … rapier, benjamin bennett pittsburgh supercomputing...

49
Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH Chris Rapier Benjamin Bennett Pittsburgh Supercomputing Center TIP ‘08

Upload: truongkhanh

Post on 28-Jul-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier Benjamin Bennett

Pittsburgh Supercomputing Center TIP ‘08

Page 2: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Moving Data

• Still crazy after all these years – Multiple solutions exist

• Protocols – UDT, SABUL, etc…

• Implementations – GridFTP, kFTP, bbFTP, hand rolled and more…

• Not to mention – Advanced congestion control, autotuning,

jumbograms, etc…

Page 3: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Many Solutions No Answers

• All developed as a solution to the same problem – Moving lots of a data very fast can be very

difficult • Unfortunately, no single solution meets

all needs. – Fast, easy to use, inexpensive to maintain,

flexible, secure

Page 4: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

What About SSH?

• Easy to use. • Cheap to maintain. • Installed everywhere. • Flexible. • Strong cryptography.

Page 5: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Why not SSH?

• It can be really really slow.

Page 6: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

703

4.6

0 100 200 300 400 500 600 700 800

Iperf

OpenSSH4.6

Mb/s

How slow?

Page 7: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

703

128

0 100 200 300 400 500 600 700 800

Iperf

OpenSSH4.7

Mb/s

A little better

Page 8: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

What changed?

• Why the improvement in OpenSSH4.7? – SSH is a multiplexed application

• Each channel requires its own flow control which is implemented as a receive window

– In 4.7 the maximum window size was increased to ~1MiB up from 64KiB

Page 9: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Windows

• Receive windows advertise the amount of data a system or application is willing to accept per round trip time.

• Effective window size is the minimum of all windows; protocol and application.

• Each window must be tuned and in sync to maximize throughput. – If any one is out of tune the entire connection will

suffer.

Page 10: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

TCP

Page 11: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

TCP

Page 12: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

TCP

Page 13: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

TCP

Page 14: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

SSHTCP

Page 15: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

SSHTCP

Page 16: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Windows in HPN-SSH

• Dynamically defined receive window size grows to match the TCP window. – Set to TCP RWIN on start. – Grows with RWIN if autotuning system. – Dynamic sizing reduces issues of over-

buffering problems.

Page 17: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

HPN-SSHTCP

Page 18: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

HPN-SSHTCP

Page 19: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

HPN-SSHTCP

Page 20: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

HPN-SSHTCP

Page 21: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

SFTP is Special

• SFTP adds *another* layer of flow control. – All SFTP packets are treated as requests – By default no more than 16 outstanding

requests. – Results in a 512KiB window – Increase using -R on command line

Page 22: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

HPN-SSHTCP SFTP

Page 23: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

703

317

0 100 200 300 400 500 600 700 800

Iperf

HPN-SSH

Mb/s

A lot better

Page 24: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

But…

• As the throughput increases crypto demands more of the processor. – The transfer is now processor bound

Page 25: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

We Need More Power?

• Two solutions to processor bound transfers – Throw more processing power at the

problem – Do the work more efficiently

• Define ‘work’

Page 26: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

The None Switch

• Many people only need secure authentication. The data can pass in the clear. – HPN-SSH allows users to switch to a ‘None’

cipher after authentication.

Page 27: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

703

694

0 100 200 300 400 500 600 700 800

Iperf

None

Mb/s

Done!

Page 28: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

As far as we can go?

• Windows are already optimized. – No more real improvements available there

• NONE cipher is limited to a subset of transfers. – Sometimes you absolutely need full

encryption. • So what now?

Page 29: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

More Power

• Common assumption that current hardware is incapable of meeting crypto demand – Is it true?

Page 30: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Packetize

Compute MAC

Encrypt

read(disk)

write(net)

Depacketize

Compute MAC

Decrypt

write(disk)

read(net)

Tx Rx

What does SSH need to do?

Page 31: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Today's Hardware

• Laptop – Two 64bit general purpose cores – 1GiB to 4GiB RAM – 1Gbps ethernet

• Desktop/Workstation – Two to eight 64bit general purpose cores – 1GiB to 8GiB RAM – 1Gbps ethernet

Page 32: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

OpenSSL Benchmarks

• Dual Intel Xeon 5345 Workstation – 4 cores per socket, 8 cores total @ 2.33Ghz – Fedora 7 stock OpenSSL build

Performance of MAC & Cipher Algorithms on 8KiB Data Blocks

hmac-md5

aes128-cbc

aes192-cbc

aes256-cbc

Mbps0 7500 15000 22500 30000

5976

6736

7704

26032

744

840

960

3232

Single CoreEight Cores

Page 33: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• hmac-md5 @ 1Gbps, ~0.3 cores • aes256-cbc @ 1Gbps, ~1.34 cores • Crypto total @ 1Gbps, ~1.64 cores • We have 8!

We have the CPU power

Page 34: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• MAC requires fraction of one core • Cipher requires more than one core • MAC, cipher, and more all within a single

execution thread

So what's the problem?

ssh

idle idle idle idle

kernel I/O idle idle

util %

Page 35: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Multi-threading on functional boundaries – Perform MAC and cipher on a packet

concurrently • Possible on sender, not on receiver

– Process multiple packets concurrently (pipeline)

– Cipher still needs more than one core • Multi-threading within cipher

– Can it be parallelized?

How can we fix it?

Page 36: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

SSH Cipher Modes

• CBC – Most common – RFC 4253 “The Secure Shell (SSH) Transport

Layer Protocol” specifies only CBC mode ciphers, arcfour, and none.

• CTR – Specified in RFC 4344 “SSH Transport Layer

Encryption Modes” – More desirable security properties than CBC

Page 37: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Cipher Block Chaining Mode Encryption

Hello, my name is CBC

XOR

IV P0

Encrypt

C0

XOR

Encrypt

C1

P1

Key

...

Page 38: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Cipher Block Chaining Mode Decryption

Decrypt

Key

C0XOR

P0

Decrypt

XOR

P1

C1IV

...

Hello, my name is CBC (cont)

Page 39: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Encrypt must be serial • Decrypt may be parallel • That doesn't help so much :-(

CBC Summary

Page 40: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Counter Mode Encryption

Hello, my name is CTR

CTR

Encrypt

Key

XOR

P0

C0

CTR + 1

Encrypt

XOR

P1

C1 ...

Page 41: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Counter Mode Decryption

Hello, my name is CTR (cont)

CTR

Encrypt

Key

XOR

C0

P0

Encrypt

XOR

C1

P1 ...

CTR + 1

Page 42: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Encrypt may be parallel • Decrypt may be parallel • Keystream can be pregenerated • Let’s get to work…

CTR Summary

Page 43: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

• Uses arbitrary number of cipher threads (and cores) to generate a single keystream.

• Cipher threads pre-generate keystream, starting once a cipher context key and IV are known.

• Leaves only keystream dequeue & XOR for encrypt/decrypt operations in main SSH thread.

Multi-threaded AES-CTR

Page 44: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Single Cipher Thread

• Cipher Thread – AES_Encrypt(ctr) – Inc(ctr)

• Main Thread – read(disk) – Packetize – Compute MAC – XOR – write(net)

Keystream Q

Page 45: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Multiple Cipher Threads

• Ring of bounded queues – Each queue holds a portion of keystream – Each queue exclusively accessed

• Queue counters offset initially and each fill

DRAINING

FILLING

FILLING

EMPTY

Main Thread

Cipher Thread 1

Cipher Thread 2

Page 46: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

M-T AES-CTR Results8-core Nodes on 1Gbps LAN

Iperf

None

aes128-ctr

aes192-ctr

aes256-ctr

Mbps

0 250 500 750 1000

938

938

938

938

417

456

506

944

OriginalHPN-SSH

Page 47: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Conclusion

• SSH designed for security – HPN-SSH is performance enhancements to the most

common SSH implementation, OpenSSH • High throughput with high latency

– Kernel auto-tuning adjusts TCP flow contol – HPN-SSH RecvBufferPolling adjusts SSH flow control

• High throughput with any latency – HPN-SSH None cipher for non-private data – HPN-SSH Multi-threaded AES-CTR cipher

Page 48: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Future Work

• Approaching 10Gbps • Continued multi-threading

– Concurrent packet processing/pipelining • Efficiency • Striped data transfers • Exotic architectures

Page 49: Enabling High Performance Bulk Data Transfers … Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH

Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center

HPN-SSH TIP’08

Where to get it

http://www.psc.edu/networking/projects/hpn-ssh

Email: [email protected]