mscs6060 parallel and distributed systems

1

MSCS6060 Parallel and Distributed Systems

Lecture 1 Introduction

2

Goal of the CourseWe study architectures, algorithms, and programming

paradigm for parallel and distributed systems.

“A distributed system consists of a collection of autonomous computers linked to a computer network and equipped with distributed system software.”

“Distributed systems is a term used to define a wide range of computer systems from a weakly-coupled system such as wide area networks, to very strongly coupled systems such as multiprocessor systems.”

3

Parallel vs. Distributed SystemWithin the context of concurrent computing of a

single job, which takes a long time or is infeasible on one processor

• strongly-coupled example: multiprocessor systems– Multiple processors, each with its own cache– Data exchange through shared memory among

processors– Single operating system

• loose-coupled example: Google data center– Multiple, independent workstations– Connected by networks

4

Parallel ComputingA single job is concurrently executed by

multiple processors for shorter time-to-solution. The system might be

• A single workstation with a multi-core processor– Each core executes parts of the job– Data exchange through shared memory

• A cluster of computers connected by high speed network– All cores in the cluster participates in execution– Data exchange through message passing across

network

5

Distributed ComputingWithin the context where concurrency is not

usually required for a single job.• Peer-to-peer (P2P) computing systems

– Participants that make a portion of their resources directly available to other network participants

– Example: Napster

• Mobile computing network

• Distributed transactions

6

Administrative Issues• Timing:

– Class: TTH 5:00-6:15PM– Office Hours: TTH 10:30AM-12:00PM at CU320 or by

appointment• Course website

– http://www.mscs.mu.edu/~rge/mscs6060• Textbook:

– An Introduction to Parallel Computing (Ed. 2) by Ananth Grama et al

– Designing and Building Parallel Programs (Online) by Ian Foster.

• Exams– Midterm: Oct 18– Take home final: 5/13/2010, Tuesday

http://www.aw-bc.com/catalog/academic/product/0,,0201648652,00+en-USS_01DBC.html

http://www-unix.mcs.anl.gov/dbpp/

7

Administrative Issues II• Grading

– Reading -- 10%– Programming assignment -- 25%– Term projects -- 25%– Midterm exam I -- 20%– Final exam -- 20%

• Late assignment– each student is allowed to submit one of his

assignments/projects within 48-hours of due time. All other assignments and projects must be submitted in time.

• Course policies

8

About Myself• Name: Rong Ge• PhD from Virginia Tech• Research areas:

– Parallel and distributed systems– High performance computing– Energy-efficient computing– Performance analysis and evaluation

9

Today’s Lecture• Big picture

– Why do we build parallel and distributed systems?

10

Non Parallel Distributed Systems

Server

terminal

terminal

terminal

terminalA sequential systemNot parallel

The server machine: a high-performance host, running one or more server programs which share its resources with clients. The clients do not share any of their resources, but requests a server's content or service function

11

Free Lunch is Over

12

Parallel System Are Ubiquitous Today

• Processors ranging from high end in servers to low power in mobiles systems– CMP technology with multiple cores

• High end server processors Xeon and Opteron– Quad-core or six-core processors

• Atom processors in netbooks– Dual-core

• Processors in cell phones– Dual-core

13

Multicore Processors

14

A Multiprocessor, Tightly-Coupled Parallel System

15

A Loosely-Coupled Computer Cluster

Server

ServerServer

ServerServer

A Distributed System with Multiple Servers

18

Traditional Uses• Numerical simulations of complex systems

and "Grand Challenge Problems" such as: – weather and climate – chemical and nuclear reactions – biological, human genome – geological, seismic activity – mechanical devices - from prosthetics to

spacecraft – electronic circuits – manufacturing processes

19

Emerging useAmazonGoogleAirline reservation systemsFinance: e-transactions, e-banking, stock-exchangeMilitaryAdvanced graphics and Virtual realityNetworked video and multi-media technologies

20

Why Parallel and Distributed Systems• Computational speedup

– Some tasks too large for even the fastest single computer

• Real time weather/climate modeling, human genome project, fluid turbulence modeling, etc

• Resource sharing– Distributed systems offer access to specialized

resources of many systems• Some nodes may have special databases/info• Some nodes may have access to special

hardware• Reliability and availability

– Redundancy, fail over

21

Why not distributed systems?

What are the disadvantages?Parallel vs sequential?distributed vs centralized?multi-server vs client-server?

Expensive (to have high computation power and redundancy)Complexity, maintenance costConcurrency => Interleaving => Bugs Failures lead to incorrectness.

22

The Scope of This Course• System architectures• Parallel computing paradigm

– Posix Multithreaded– OpenMP– Message passing– UPC if time allows– Heterogeneous programming

• Parallel algorithm design and analysis• Distributed system design technology

– Google File system– Hadoop MapReduce– Amazon storage systems

• Peer to peer computing• Cloud computing

23

Summary• What is parallel and distributed systems• What are they good for?

24

Multiprogramming and Multithreading

5 components of any Computer

Processor (active)

Computer

Control(“brain”)

Datapath(“brawn”)

Memory(passive)

(where programs, data live whenrunning)

Devices

Input

Output

Keyboard, Mouse

Display, Printer

Disk (where programs, data live whennot running)

Levels of Abstraction for a Computer

26

I/O systemProcessor

CompilerOperating

System

Applications

Instruction Set ArchitectureMemoryHardware

Software Assembler

OS manages resourcesOS manages resources and schedule processes

27

History of Computing Systems• Phase 1: hardware is expensive, humans are

cheap – User at console: single-user systems– Batching systems (1955-1965)– Multiprogramming systems (1965-1980)

• Phase 2: hardware is cheap, humans are expensive– Time sharing: users use cheap terminal and share

servers• Phase 3: hardware is very cheap, humans are

very expensive– Personal computing: one system per user– Distributed computing: lots of systems per user

Multiprogramming• Multiple sequential processes in a run state can be

swapped in and out by the OS– Run state: the executable file and data are held in memory– Better utilization of CPU – High throughput– Short timespan for a single program

• IO interrupt– While I/O is in progress for program P1, the computer will execute

several thousand instructions of program P2 and return to process the data obtained for P1 when it is ready or when P2 is waiting on I/O

• Time sharing– The execution of program P1 will be interrupted by time too

29

Process• A process is an abstraction that supports running

programs• A process is the basic unit of execution in an

operating system• Different processes may run several instances of

the same program• At a minimum, process execution requires

– Memory to contain the program file and data– A set of CPU registers to support execution

30

Anatomy of a Process• Executable file

– Header– Code– Initialized data

• Process’s address space• Process control block

31

CPU Switch From Process to Process

32

Costs of Concurrency Transparency• Independent processes cannot maliciously or

inadvertently affect the correctness of each other’s behaviors.

• Cost of concurrency transparency– Creation: OS must create a complete independent

address space• Initialize memory segments• Copy the associated program into a text segment• Set up a stack for temporary data

– Switching: • Save the CPU context (register values, program

counter, stack pointer, …)• Modify registers of the MMU• Invalidate address translation caches• Swap processes between main memory and disk

33

Single and Multithreaded Processes

34

Threads vs. Processes

Thread

• A thread has no data segment or heap

• A thread cannot live on its own, it must live within a process

• There can be more than one thread in a process, the first thread calls main and has the process’s stack

• Inexpensive creation• Inexpensive context switching• If a thread dies, its stack is

reclaimed by the process

Processes

• A process has code/data/heap and other segments

• There must be at least one thread in a process

• Threads within a process share code/data/heap, share I/O, but each has its own stack and registers

• Expense creation• Expensive context switching• It a process dies, its resources are

reclaimed and all threads die

35

Thread Implementation• Process defines address space• Threads share address space

• Process Control Block (PCB) contains process-specific info– PID, owner, heap pointer,

active threads and pointers to thread info

• Thread Control Block (TCB) contains thread-specific info– Stack pointer, PC, thread

state, register …

CODE

Initialized data

Heap

Stack – thread 2

Stack – thread 1

DLL’s

Reserved

Process’s address space

$pc$spStateRegisters……

$pc$spStateRegisters……

TCB for thread2

TCB for thread1

36

Benefits of Threading

• Responsiveness– When one thread is blocked, your browser still responds– E.g. download images while allowing your interaction

• Resource Sharing– Share the same address space– Reduce overhead (e.g. memory)

• Economy– Creating a new process costs memory and resources– E.g. in Solaris, 30 times slower in creating process than thread

• Utilization of MP Architectures– Threads can be executed in parallel on multiple processors– Increase concurrency and throughput

37

Examples of Threads• A web browser

– One thread displays images– One thread retrieves data from network

• A word processor– One thread displays graphics– One thread reads keystrokes– One thread performs spell checking in the background

• A web server– One thread accepts requests– When a request comes in, separate thread is created to service– Many threads to support thousands of client requests

Once Again....

Process ID

S ignal D ispatchT able

M em ory M ap

File Descriptor T able

Stack P ointer &Stack

H eap

R egisters

Process Priority

A PROCESS

Thread ID

S ignal D ispatchT ab le

Stack Pointer &Stack

Regis te rs

Thread Priority

A THREAD

Program Counter Program C ounter

All threads sharethe sam e

memory, heap,and file handles

(and offsets)

What does the developer have to do?

• Decide how to decompose the computation into parallel parts.

• Create and destroy threads to support the decomposition

• Add synchronization to make sure dependences are covered.

What’s POSIX Got To Do With It?

• Each OS had it’s own thread library and style• That made writing multithreaded programs difficult

because:– you had to learn a new API with each new OS– you had to modify your code with each port to a new

OS• POSIX (IEEE 1003.1c-1995) provided a standard known as

Pthreads

Pthreads: A shared memory programming model

• POSIX standard shared memory multithreading interface.• Not just for parallel programming, but for general

multithreaded programming• Provide primitives for thread management and

synchronization. • Threads are commonly associated with shared memory

architectures and operating systems.– Necessary for unleashing the computing power of SMT and CMP

processors.– Making it easy and efficient is very important at this time.

Pthreads: execution model• A single process can have multiple,

concurrent execution paths. – a.out creates a number of threads that can be

scheduled and run concurrently. – Each thread has local data, but also, shares the

entire resources (global data) of a.out. – Any thread can execute any subroutine at the

same time as other threads. – Threads communicate through global memory.

43

Summary• Process vs. thread• Multiprogramming vs. multhreading

Pthreads: POSIX Threads

• Pthreads is a standard set of C library functions for multithreaded programming– IEEE Portable Operating System Interface, POSIX, section 1003.1

standard, 1995• Pthread Library (60+ functions)

– Thread management: create, exit, detach, join, . . .– Thread cancellation– Mutex locks: init, destroy, lock, unlock, . . .– Condition variables: init, destroy, wait, timed wait, . . .– . . .

• Programs must include the file pthread.h• Programs must be linked with the pthread library (-lpthread)

Pthreads Naming Convention

• Types: pthread[_object]_t• Functions: pthread[_object]_action• Constants/Macros: PTHREAD_PURPOSE

• Examples:– pthread_t: the type of a thread– pthread_create(): creates a thread– pthread_mutex_t: the type of a mutex lock– pthread_mutex_lock(): lock a mutex

pthread_create()

• Creates a new thread

int pthread_create ( pthread_t *thread,

pthread_attr_t *attr,void * (*start_routine) (void *),void *arg);

– Returns 0 to indicate success, otherwise returns error code– thread: output argument for the id of the new thread– attr: input argument that specifies the attributes of the thread to be

created (NULL = default attributes)– start_routine: function to use as the start of the new thread

• must have prototype: void * foo(void*)– arg: argument to pass to the new thread routine

• If the thread routine requires multiple arguments, they must be passed bundled up in an array or a structure

pthread_create() example

• Want to create a thread to compute the sum of the elements of an array

void *do_work(void *arg);• Needs three arguments

– the array, its size, where to store the sum– we need to bundle them in a structure

struct arguments {double *array;int size;double *sum;

}


int main(int argc, char *argv) { double array[100]; double sum; pthread_t worker_thread; struct arguments *arg; arg = (struct arguments *)calloc(1,

sizeof(struct arguments)); arg->array = array; arg->size=100; arg->sum = ∑

if (pthread_create(&worker_thread, NULL, do_work, (void *)arg)) { fprintf(stderr,”Error while creating thread\n”); exit(1); } ...}


void *do_work(void *arg) { struct arguments *argument; int i, size; double *array; double *sum;

argument = (struct arguments*)arg; size = argument->size; array = argument->array; sum = argument->sum; *sum = 0; for (i=0;i<size;i++) *sum += array[i];

return NULL;}

Comments about the example

• The “main thread” continues its normal execution after creating the “child thread”

• IMPORTANT: If the main thread terminates, then all threads are killed!– We will see that there is a join() function

• Of course, memory is shared by the parent and the child (the array, the location of the sum)– nothing prevents the parent from doing something to it while

the child is still executing – which may lead to a wrong computation– we will see that Pthreads provide locking mechanisms

• The bundling and unbundling of arguments is a bit tedious

pthread_exit()

• Terminates the calling thread

void pthread_exit(void *retval);

– The return value is made available to another thread calling a pthread_join() (introduced later)

– My previous example had the thread just return from function do_work()

• In this case the call to pthread_exit() is implicit• The return value of the function serves as the argument to the

(implicitly called) pthread_exit().

mscs6060 parallel and distributed systems

Documents

parallel system

parallel programs

distributed systemslecture

distributed systemwithin

distributed system software

parallel computinga

server programs

core processorsatom