case studies class 8 experiencing cluster computing

64
Case Studies Case Studies Class 8 Experiencing Cluster Computing

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Case Studies Class 8 Experiencing Cluster Computing

Case Studies Case Studies

Class 8

Experiencing Cluster Computing

Page 2: Case Studies Class 8 Experiencing Cluster Computing

DescriptionDescription

• Download the source fromhttp://www.sci.hkbu.edu.hk/tdgc/tutorial/ExpClusterComp/casestudy/casestudy.zip

• Unzip the package

• Follow the instructions from each example

Page 3: Case Studies Class 8 Experiencing Cluster Computing

Hello WorldHello World

Page 4: Case Studies Class 8 Experiencing Cluster Computing

Hello WorldHello World

• The sample program uses MPI and has each MPI process print

Hello world from process i of n

• using the rank in MPI_COMM_WORLD for i and the size of MPI_COMM_WORLD for n. You can assume that all processes support output for this example.

• Note the order that the output appears in. Depending on your MPI implementation, characters from different lines may be intermixed. A subsequent exercise (I/O master/slaves) will show how to order the output.

• You may want to use these MPI routines in your solution:MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Finalize

Page 5: Case Studies Class 8 Experiencing Cluster Computing

Hello WorldHello World

Sourcecasestudy/helloworld/helloworld.ccasestudy/helloworld/Makefile

Compile and run% mpicc -o helloworld helloworld.c% mpirun -np 4 helloworld

Sample outputHello world from process 0 of 4Hello world from process 3 of 4Hello world from process 1 of 4Hello world from process 2 of 4

Page 6: Case Studies Class 8 Experiencing Cluster Computing

Sending in a RingSending in a Ring

Page 7: Case Studies Class 8 Experiencing Cluster Computing

Sending in a RingSending in a Ring

• The sample program that takes data from process zero and sends it to all of the other processes by sending it in a ring. That is, process i should receive the data and send it to process i+1, until the last process is reached.

• Assume that the data consists of a single integer. Process zero reads the data from the user.

• You may want to use these MPI routines in your solution:

MPI_Send, MPI_Recv

Page 8: Case Studies Class 8 Experiencing Cluster Computing

Sending in a RingSending in a Ring

Value

Value

Value

Value

Process 0

Process 1

Process 2

Process i

Process n -1

Page 9: Case Studies Class 8 Experiencing Cluster Computing

Sending in a RingSending in a Ring

Sourcecasestudy/ring/ring.ccasestudy/ring/Makefile

Compile and run% mpicc -o ring ring.c% mpirun -np 4 ring

Sample Output10Process 0 got 1022Process 0 got 22-1Process 0 got -1Process 3 got 10Process 3 got 22Process 3 got -1Process 2 got 10Process 2 got 22Process 2 got -1Process 1 got 10Process 1 got 22Process 1 got -1

Page 10: Case Studies Class 8 Experiencing Cluster Computing

Finding PI using MPI Finding PI using MPI collective operationscollective operations

Page 11: Case Studies Class 8 Experiencing Cluster Computing

Finding PI using MPI collective Finding PI using MPI collective operationsoperations

• The method evaluates PI using the integral of 4/(1+x*x) between 0 and 1. The integral is approximated by a sum of n intervals

• The approximation to the integral in each interval is (1/n)*4/(1+x*x).

• The master process asks the user for the number of intervals

• The master then broadcast this number to all of the other processes.

• Each process then adds up every n'th interval (x = 0+rank/n, 0+rank/n+size/n,...).

• Finally, the sums computed by each process are added together using a reduction.

Page 12: Case Studies Class 8 Experiencing Cluster Computing

Finding PI using MPI collective Finding PI using MPI collective operationsoperations

Sourcecasestudy/pi/pi.ccasestudy/pi/Makefile

Sample Output:Enter the number of intervals: (0 quits) 100pi is approximately 3.1416009869231249, Error is 0.0000083333333318Enter the number of intervals: (0 quits) 1000pi is approximately 3.1415927369231262, Error is 0.0000000833333331Enter the number of intervals: (0 quits) 10000pi is approximately 3.1415926544231256, Error is 0.0000000008333325Enter the number of intervals: (0 quits) 100000pi is approximately 3.1415926535981269, Error is 0.0000000000083338Enter the number of intervals: (0 quits) 1000000pi is approximately 3.1415926535898708, Error is 0.0000000000000777Enter the number of intervals: (0 quits) 10000000pi is approximately 3.1415926535897922, Error is 0.0000000000000009

Page 13: Case Studies Class 8 Experiencing Cluster Computing

Implementing Fairness Implementing Fairness using Waitsomeusing Waitsome

Page 14: Case Studies Class 8 Experiencing Cluster Computing

Implementing Fairness using WaitsoImplementing Fairness using Waitsomeme

• Write a program to provide fair reception of message from all sending processes. Arrange the program to have all processes except process 0 send 100 messages to process 0. Have process 0 print out the messages as it receives them. Use nonblocking receives and MPI_Waitsome.

Is the MPI implementation fair?

• You may want to use these MPI routines in your solution:

MPI_Waitsome, MPI_Irecv, MPI_Cancel

Page 15: Case Studies Class 8 Experiencing Cluster Computing

Implementing Fairness using WaitsoImplementing Fairness using Waitsomeme

Source:casestudy/fairness/fairness.ccasestudy/fairness/Makefile

Sample Output:Msg from 1 with tag 0Msg from 1 with tag 1Msg from 1 with tag 2Msg from 1 with tag 3Msg from 1 with tag 4…Msg from 2 with tag 21Msg from 1 with tag 55Msg from 2 with tag 22Msg from 1 with tag 56…

Page 16: Case Studies Class 8 Experiencing Cluster Computing

Master/slaveMaster/slave

Page 17: Case Studies Class 8 Experiencing Cluster Computing

Master/slaveMaster/slave

• Message passing is well-suited to handling computations where a task is divided up into subtasks, with most of the processes used to compute the subtasks and a few processes (often just one process) managing the tasks. The manager is called the "master" and the others the "workers" or the "slaves".

• In this example, it is to build an Input/Output master/slave system. This will allow you to relatively easily arrange for different kinds of input and output from the program, including– Ordered output (process 2 after process 1) – Duplicate removal (a single instance of "Hello world" instead of

one from each process)– Input to all processes from a terminal

Page 18: Case Studies Class 8 Experiencing Cluster Computing

Master/slaveMaster/slave

• This will be accomplished by dividing the processes in MPI_COMM_WORLD into two sets:– The master (who will do all of the I/O) and the slaves (who will do

all of their I/O by contacting the master).– The slaves will also do any other computation that they might des

ire; for example, they might implement the Jacobi iteration.

• The master should accept messages from the slaves (of type MPI_CHAR) and print them in rank order (that is, first from slave 0, then from slave 1, etc.). The slaves should each send 2 messages to the master. For simplicity, Have the slaves send the messages

Hello from slave 3Goodbye from slave 3

• You may want to use these MPI routines in your solution:MPI_Comm_split, MPI_Send, MPI_Recv

Page 19: Case Studies Class 8 Experiencing Cluster Computing

Master/slaveMaster/slave

Sourcecasestudy/io/io.c

casestudy/io/Makefile

Sample Output% mpicc -o io io.c

% mpirun -np 4 io

Hello from slave 0

Hello from slave 1

Hello from slave 2

Goodbye from slave 0

Goodbye from slave 1

Goodbye from slave 2

Page 20: Case Studies Class 8 Experiencing Cluster Computing

A simple output A simple output serverserver

Page 21: Case Studies Class 8 Experiencing Cluster Computing

A simple output serverA simple output server

• Modify the previous example accept three types of messages from the slaves. These types are

– Ordered output (just like the previous exercise) – Unordered output (as if each slave printed directly) – Exit notification (see below)

• The master continues to receive messages until it has received an exit message from each slave. For simplicity in programming, have each slave send the messages

Hello from slave 3Goodbye from slave 3

and I'm exiting (3)

• You may want to use these MPI routines in your solution:MPI_Comm_split, MPI_Send, MPI_Recv

with the ordered output mode

with the unordered output mode

Page 22: Case Studies Class 8 Experiencing Cluster Computing

A simple output serverA simple output server

Sourcecasestudy/io2/io2.ccasestudy/io2/Makefile

Sample Output% mpicc -o io2 io2.c% mpirun -np 4 io2Hello from slave 0Hello from slave 1Hello from slave 2Goodbye from slave 0Goodbye from slave 1Goodbye from slave 2I'm exiting (0)I'm exiting (2)I'm exiting (1)

Page 23: Case Studies Class 8 Experiencing Cluster Computing

Benchmarking Benchmarking collective barriercollective barrier

Page 24: Case Studies Class 8 Experiencing Cluster Computing

Benchmarking collective barrierBenchmarking collective barrier

• The sample program measures the time it takes to perform an MPI_Barrier on MPI_COMM_WORLD.

• It will print the size of MPI_COMM_WORLD and time for each test and make sure that both sender and receiver are ready when the test begin.

• How does the performance of MPI_Barrier vary with the size of MPI_COMM_WORLD?

Page 25: Case Studies Class 8 Experiencing Cluster Computing

Benchmarking collective barrierBenchmarking collective barrier

Source:casestudy/barrier/barrier.c

casestudy/barrier/Makefile

Sample Output:% mpirun -np 1 barrier

Kind np time (sec)

Barrier 1 0.000000

Barrier 5 0.000212

Barrier 10 0.000258

Barrier 15 0.000327

Barrier 20 0.000401

Barrier 40 0.000442

Page 26: Case Studies Class 8 Experiencing Cluster Computing

Determining the amount Determining the amount of MPI bufferingof MPI buffering

Page 27: Case Studies Class 8 Experiencing Cluster Computing

Determining the amount of MPI Determining the amount of MPI bufferingbuffering

• The sample program determines the amount of buffering that MPI_Send provides. That it, determining how large a message can be sent with MPI_Send without a matching receive at the destination.

• You may want to use these MPI routines in your solution:

MPI_Wtime, MPI_Send, MPI_Recv

Page 28: Case Studies Class 8 Experiencing Cluster Computing

Determining the amount of MPI Determining the amount of MPI bufferingbuffering

Hint:Use MPI_Wtime to establish a delay until an MPI_Recv is called at the destination process. By timing the MPI_Send, you can detect when the MPI_Send was waiting for the MPI_Recv

Source:casestudy/buflimit/buflimit.c

casestudy/buflimit/Makefile

Page 29: Case Studies Class 8 Experiencing Cluster Computing

Determining the amount of MPI Determining the amount of MPI bufferingbuffering

Sample Output:% mpirun -np 2 buflimitProcess 0 on tdgrocks.sci.hkbu.edu.hkProcess 1 on comp-pvfs-0-1.local0 received 1024 fr 11 received 1024 fr 00 received 2048 fr 11 received 2048 fr 00 received 4096 fr 11 received 4096 fr 00 received 8192 fr 11 received 8192 fr 00 received 16384 fr 11 received 16384 fr 00 received 32768 fr 11 received 32768 fr 0MPI_Send blocks with buffers of size 655360 received 65536 fr 11 received 65536 fr 0

Page 30: Case Studies Class 8 Experiencing Cluster Computing

Exploring the cost of Exploring the cost of synchronization delayssynchronization delays

Page 31: Case Studies Class 8 Experiencing Cluster Computing

Exploring the cost of synchronization Exploring the cost of synchronization delaysdelays

• In this example, 2 processes are communicating with a third.

• Process 0 is sending a long message to process 1 and process 2 is sending a relatively short message to process 1 and then to process 0.

• The code is arranged so that process 1 has already posted an MPI_Irecv for the message from process 2 before receiving the message from process 0, but also ensure that process 1 receives the long message from process 0 before receiving the message from process 2.

Page 32: Case Studies Class 8 Experiencing Cluster Computing

Exploring the cost of synchronization Exploring the cost of synchronization delaysdelays

• This seemingly complex communication pattern but can occur in an application due to timing variations on each processor. – If the message sent by process 2 to process 1 is short

but long enough to require a rendezvous protocol (meeting point), there can be a significant delay before the short message from process 2 is received by process 1, even though the receive for that message is already available.

– Explore the possibilities by considering various lengths of messages.

Page 33: Case Studies Class 8 Experiencing Cluster Computing

Exploring the cost of synchronization Exploring the cost of synchronization delaysdelays

SEND P1

RECV P0 SEND P1

RECV P2 SEND P0

Process 0 Process 1 Process 2

IRECV P2

Short message

Short message

Long message SEND P2 RECV P1

Short message

Page 34: Case Studies Class 8 Experiencing Cluster Computing

Exploring the cost of synchronization Exploring the cost of synchronization delaysdelays

Sourcecasestudy/bad/bad.c

casestudy/bad/Makefile

Sample Output% mpirun -np 3 maxtime

[2] Litsize = 1, Time for first send = 0.000020, for second = 0.000009

Page 35: Case Studies Class 8 Experiencing Cluster Computing

GraphicsGraphics

Page 36: Case Studies Class 8 Experiencing Cluster Computing

GraphicsGraphics

• A simple MPI example program that uses a number of procedures in the MPE graphics library.

• The program draws lines and squares with different colors in graphic mode.

• User can select a region and the program will report the selected coordination.

Page 37: Case Studies Class 8 Experiencing Cluster Computing

GraphicsGraphics

Source:casestudy/graph/mpegraph.c

casestudy/graph/Makefile

Page 38: Case Studies Class 8 Experiencing Cluster Computing

GalaxSeeGalaxSee

Page 39: Case Studies Class 8 Experiencing Cluster Computing

GalaxSeeGalaxSee

• The GalaxSee program lets the user model a number of bodies in space moving under the influence of their mutual gravitational attraction.

• It is effective for relatively small numbers of bodies (on the order of a few hundred), rather than the large numbers (over a million) currently being used by scientists to simulate galaxies.

• GalaxSee allows the user to see the effects that various initial configurations (mass, velocity, spacial distribution, rotation, dark matter, and presence of an intruder galaxy) have on the behavior of the system.

Page 40: Case Studies Class 8 Experiencing Cluster Computing

GalaxSeeGalaxSee

• Command line options: – num_stars star_mass t_final do_display.

where

num_stars : the number of stars (integer),

star_mass : star mass (decimal),

t_final : final time for the model in Myears (decimal).

do_display : enter a 1 to show a graphical display,

or a 0 to not show a graphical display.

Page 41: Case Studies Class 8 Experiencing Cluster Computing

GalaxSeeGalaxSee

Source:casestudy/galaxsee/Gal_pack.tgz

Reference:http://www.shodor.org/master/galaxsee/

Page 42: Case Studies Class 8 Experiencing Cluster Computing

Cracking RSACracking RSA

Page 43: Case Studies Class 8 Experiencing Cluster Computing

CryptanalysisCryptanalysis

• Cryptanalysis is the study of how to compromise (defeat) cryptographic mechanisms, and cryptology is the discipline of cryptography and cryptanalysis combined.

• To most people, cryptography is concerned with keeping communications private. Indeed, the protection of sensitive communications has been the emphasis of cryptography throughout much of its history.

Page 44: Case Studies Class 8 Experiencing Cluster Computing

Encryption and DecryptionEncryption and Decryption

• Encryption is the transformation of data into a form that is as close to impossible as possible to read without the appropriate knowledge (a key; see below). Its purpose is to ensure privacy by keeping information hidden from anyone for whom it is not intended, even those who have access to the encrypted data.

• Decryption is the reverse of encryption; it is the transformation of encrypted data back into an intelligible form.

Page 45: Case Studies Class 8 Experiencing Cluster Computing

CCryptographyryptography

• Today's cryptography is more than encryption and decryption. Authentication is as fundamentally a part of our lives as privacy.

• We use authentication throughout our everyday lives - when we sign our name to some document for instance - and, as we move to a world where our decisions and agreements are communicated electronically, we need to have electronic techniques for providing authentication.

Page 46: Case Studies Class 8 Experiencing Cluster Computing

Public-Key vs. Secret-Key Public-Key vs. Secret-Key CryptographyCryptography

• A cryptosystem is simply an algorithm that can convert input data into something unrecognizable (encryption), and convert the unrecognizable data back to its original form (decryption).

• To encrypt, feed input data (known as "plaintext") and an encryption key to the encryption portion of the algorithm.

• To decrypt, feed the encrypted data (known as "ciphertext") and the proper decryption key to the decryption portion of the algorithm. The key is simply a secret number or series of numbers. Depending on the algorithm, the numbers may be random or may adhere to mathematical formulae.

Page 47: Case Studies Class 8 Experiencing Cluster Computing

Public-Key vs. Secret-Key Public-Key vs. Secret-Key CryptographyCryptography

• The drawback to secret-key cryptography is the necessity of sharing keys.

• For instance, suppose Alice is sending email to Bob. She wants to encrypt it first so any eavesdropper will not be able to understand the message. But if she encrypts using secret-key cryptography, she has to somehow get the key into Bob's hands. If an eavesdropper can intercept a regular message, then an eavesdropper will probably be able to intercept the message that communicates the key.

Page 48: Case Studies Class 8 Experiencing Cluster Computing

Public-Key vs. Secret-Key Public-Key vs. Secret-Key CryptographyCryptography

• In contrast to secret-key is public-key cryptography. In such a system there are two keys, a public key and its inverse, the private key.

• In such a system when Alice sends email to Bob, she finds his public key (possibly in a directory of some sort) and encrypts her message using that key. Unlike secret-key cryptography, though, the key used to encrypt will not decrypt the ciphertext. Knowledge of Bob's public key will not help an eavesdropper. To decrypt, Bob uses his private key. If Bob wants to respond to Alice, he will encrypt his message using her public key.

Page 49: Case Studies Class 8 Experiencing Cluster Computing

The One-Way FunctionThe One-Way Function

• The challenge of public-key cryptography is developing a system in which it is impossible (or at least intractable) to deduce the private key from the public key.

• This can be accomplished by utilizing a one-way function. With a one-way function, given some input values, it is relatively simple to compute a result. But if you start with the result, it is extremely difficult to compute the original input values. In mathematical terms, given x, computing f(x) is easy, but given f(x), it is extremely difficult to determine x.

Page 50: Case Studies Class 8 Experiencing Cluster Computing

RSARSA

• The RSA cryptosystem is a public-key cryptosystem that offers both encryption and digital signatures (authentication). Ronald Rivest, Adi Shamir, and Leonard Adleman developed the RSA system in 1977 [RSA78]; RSA stands for the first letter in each of its inventors' last names.

Page 51: Case Studies Class 8 Experiencing Cluster Computing

RSA AlgorithmRSA Algorithm

The RSA algorithm works as follows:1. Take two large primes, p and q, and compute thei

r product n = pq; n is called the modulus.2. Choose a number, e, less than n and relatively pri

me to (p-1)(q-1), which means e and (p-1)(q-1) have no common factors except 1.

3. Find another number d such that (ed - 1) is divisible by (p-1)(q-1). The values e and d are called the public and private exponents, respectively.

4. The public key is the pair (n, e); the private key is (n, d). The factors p and q may be destroyed or kept with the private key.

Page 52: Case Studies Class 8 Experiencing Cluster Computing

RSA AlgorithmRSA Algorithm

It is currently difficult to obtain the private key d from the public key (n, e). However if one could factor n into p and q, then one could obtain the private key d. Thus the security of the RSA system is based on the assumption that factoring is difficult.

Page 53: Case Studies Class 8 Experiencing Cluster Computing

EncryptionEncryption

Suppose Alice wants to send a message m to Bob.

• Alice creates the ciphertext c by exponentiating: c = me mod n, where e and n are Bob's public key. She sends c to Bob.

• To decrypt, Bob also exponentiates: m = cd mod n; the relationship between e and d ensures that Bob correctly recovers m.

• Since only Bob knows d, only Bob can decrypt this message.

Page 54: Case Studies Class 8 Experiencing Cluster Computing

Digital SignatureDigital Signature

• Alice creates a digital signature s by exponentiating: s = md mod n, where d and n are Alice's private key. She sends m and s to Bob.

• To verify the signature, Bob exponentiates and checks that the message m is recovered: m = se mod n, where e and n are Alice's public key.

Suppose Alice wants to send a message m to Bob in such a way that Bob is assured the message is both authentic, has not been tampered with, and from Alice.

Page 55: Case Studies Class 8 Experiencing Cluster Computing

EncryptionEncryption

• Thus encryption and authentication take place without any sharing of private keys: – each person uses only another's public key or their

own private key.

• Anyone can send an encrypted message or verify a signed message, but only someone in possession of the correct private key can decrypt or sign a message.

Page 56: Case Studies Class 8 Experiencing Cluster Computing

What would it take to break the RSA What would it take to break the RSA cryptosystem?cryptosystem?

• The obvious way to do this attack is to factor the public modulus, n, into its two prime factors, p and q. From p, q, and e, the public exponent, the attacker can easily get d, the private exponent. The hard part is factoring n; the security of RSA depends on factoring being difficult.

• You can use d to factor n, as well as use the factorization of n to find d.

Page 57: Case Studies Class 8 Experiencing Cluster Computing

What would it take to break the RSA What would it take to break the RSA cryptosystem?cryptosystem?

• Another way to break the RSA cryptosystem is to find a technique to compute eth roots mod n. Since c = me mod n, the eth root of c mod n is the message m. This attack would allow someone to recover encrypted messages and forge signatures even without knowing the private key. This attack is not known to be equivalent to factoring. No general methods are currently known that attempt to break the RSA system in this way. However, in special cases where multiple related messages are encrypted with the same small exponent, it may be possible to recover the messages.

Page 58: Case Studies Class 8 Experiencing Cluster Computing

What would it take to break the RSA What would it take to break the RSA cryptosystem?cryptosystem?

• Some people have also studied whether part of the message can be recovered from an encrypted message.

• The simplest single-message attack is the guessed plaintext attack. An attacker sees a ciphertext and guesses that the message might be, for example, "Attack at dawn," and encrypts this guess with the public key of the recipient and by comparison with the actual ciphertext, the attacker knows whether or not the guess was correct. Appending some random bits to the message can thwart this attack.

Page 59: Case Studies Class 8 Experiencing Cluster Computing

What would it take to break the RSA What would it take to break the RSA cryptosystem?cryptosystem?

• Of course, there are also attacks that aim not at the cryptosystem itself but at a given insecure implementation of the system;

• These do not count as "breaking" the RSA system, because it is not any weakness in the RSA algorithm that is exploited, but rather a weakness in a specific implementation.

• For example, if someone stores a private key insecurely, an attacker may discover it. One cannot emphasize strongly enough that to be truly secure, the RSA cryptosystem requires a secure implementation; mathematical security measures, such as choosing a long key size, are not enough. In practice, most successful attacks will likely be aimed at insecure implementations and at the key management stages of an RSA system.

Page 60: Case Studies Class 8 Experiencing Cluster Computing

How much does it cost to factor a How much does it cost to factor a large number?large number?

Number Length (bits)

Machines Memory

430 1 trivial

760 215,000 4 Gb

1020 342,000,000 170 Gb

1620 1.6 x 1015 120 Tb

Page 61: Case Studies Class 8 Experiencing Cluster Computing

The RSA Challenge NumbersThe RSA Challenge Numbers

• The currently challenge number is 640 bits 193 digits

• A link to each of the eight RSA challenge numbers is listed below.

• US $20,000 will be given to those who factored the number RSA-640.

Reference:http://www.rsasecurity.com/rsalabs/node.asp?id=2093

Page 62: Case Studies Class 8 Experiencing Cluster Computing

RSA CrackerRSA Cracker

• Serial Randomized Brute Force Attack• Source:

– casestudy/rsa2/rsa2.c

• Reference:– http://www.daimi.au.dk/~aveng/projects/rsa/

• Parallelized version?– Do it yourself!!– The programming structure is given to you.– rsa2/Makefile– rsa2/popsys.c

Page 63: Case Studies Class 8 Experiencing Cluster Computing

A Parallel Implementation of the A Parallel Implementation of the Quadratic Sieve AlgorithmQuadratic Sieve Algorithm

• The purpose of the project is to implement a parallel version of the quadratic sieve algorithm used for factoring large composite integers.

Source:casestudy/mpqs/mpqs_parallel.tgz

Reference:http://www.daimi.au.dk/~pmn/scf02/CDROM/pr2/

Page 64: Case Studies Class 8 Experiencing Cluster Computing

ENDEND