an enhanced portable thread manager presentation by vijay murthi supervisor: david levine committee...

28
An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

Upload: penelope-turner

Post on 21-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

An Enhanced Portable Thread Manager

Presentation by

Vijay Murthi

Supervisor: David Levine

Committee Members: Behrooz Shirazi and Bob Weems

Page 2: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

2

Multithreading

• Creating portable and automatically scalable parallel software has been a goal for many researchers and practitioners since the advent of parallel computing.

• Threading is an effective way to use resources of a shared memory system. – Similar to multi-processing except threads share the same address space

whereas processes don’t. – Inter-thread communication much faster and less restrictive.– Less overhead compared to process in terms of creation, deletion and

management (like context switching).

• Even on single processor machines sometimes threads are useful– An application accessing the database while waiting for an input from

the user.

Page 3: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

3

Existing Solutions and Bottlenecks

• Hand code threading directives inside source code for calls to thread libraries.

• Parallel programming languages.• Compiler based directives

– The above methods are unduly expensive in terms of software development time, development cost and programming expertise required.

• Programmers still needs to concentrate on the ‘how’ of programming rather than the ‘what’ of programming.

Page 4: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

4

Programming using PARSA™

• Makes parallel programming similar to developing sequential programming

• Provides two levels of abstraction :– Abstraction from low level parallel programming issues– Abstraction from deployment issues

• Two tightly coupled tools have been developed to achieve this:– Software Development Environment (SDE)

• Address programming issues

– Thread Manager • Address deployment and portability issues

Page 5: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

5

Software Development Environment

Based on object based programming methodology that transforms a project automatically into a parallel and scalable source code (in terms of CPUs).

Projects consists of graphical objects and arcs.

Each object represents a project task to be performed.

Arcs indicate the dependencies between objects.

Page 6: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

6

Programming using SDE

• Interfaces define the “contract” a graphical object has with other graphical objects in a project.

• Each graphical object can have an INPUT interface and an OUTPUT interface.

• Arcs are lines connecting a desired OUTPUT port to another INPUT port.

• Semantically, however, arcs represent data “being passed” from a source object to a destination object.

Page 7: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

7

Projects

Two types of projects –Applications and Functions

Function projects are like applications except Functions have inputs and outputs. The code generated is reentrant. Invocations similar to calling functions from

standard languages like C (or) C++. FunctionInputs - passing the input

arguments to the corresponding objects to start execution

FunctionOutputs - delivering the output variables back to the calling program.

Page 8: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

8

Execution Model

• The project implicitly defines the order of execution.

• The SDE has an in-built source code generator for automatic code generation.

• The generated source code make calls to the thread manager for runtime management of threads.

Page 9: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

9

ThreadMan™ Thread Manager

• A dynamic linkable library with a standard API.• Eliminates the need for programmers to develop

code to manage the runtime execution.• Ensures parallel software executes according to

the execution model.• Supports various forms of parallelism. • Makes the generated code portable.

– NOTE: Programmers needs to take care of their own system specific library calls.

Page 10: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

10

Contribution

• ThreadMan Enhancements.– Motivation.– Design.– Runtime Analysis.

• Reentrant Code Generation in PARSA.– Motivation.– Design.– Runtime Analysis.

Page 11: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

11

ThreadMan Enhancements

Page 12: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

12

Motivation

• Problems found in the architecture and implementation of ThreadMan v1.x.– Poor API design.

• Inconsistent naming conventions used.

– Redundant information stored and updated.• Poor run-time performance and memory utilization.

– PARSA-generated source code bloat.• Required more code to be generated for run time management of PARSA

projects.

– Memory leaks.• Unreliable code – applications could crash.

– Limited system support.• No support for MS Windows.

Page 13: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

13

Design

• ThreadMan has been completely re-designed– An entirely new architecture for better runtime

performance– An new Application Programming Interface (API).– Consistent naming convention used.– Efficient handling of data structures.– Expanded to extend system support.

• API functions added to support MS Windows.– Programmer accessible.

• Provides APIs to the most commonly used threading, mutex and semaphore directives.

Page 14: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

14

ThreadMan Architecture

Processor Level

User-level thread library

TH ..

.

..

.

THUser S pace

Processor Level

User-level thread library

Processor Level

User-level thread library

Microsoft Windows SolarisLinux

Scheduler Scheduler Scheduler

User

Kernel

Hardware

ThreadMan

Processor Level

User-level thread library

TH ..

.

..

.

THUser S pace

Processor Level

User-level thread library

Processor Level

User-level thread library

Microsoft Windows SolarisLinux

Scheduler Scheduler Scheduler

User

Kernel

Hardware

ThreadMan

Page 15: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

15

ThreadMan Components

TH.

. .

..

.

TM -ThreadMan

TMS - ThreadMan scheduler

TMPI - ThreadMan Portable Interface

TM

TH

User Process

Scheduler

User

Kernel

Hardware

TMPI

TMS

CPU

User-level thread libraries

TH.

. .

..

.

TM -ThreadMan

TMS - ThreadMan scheduler

TMPI - ThreadMan Portable Interface

TM

TH

User Process

Scheduler

User

Kernel

Hardware

TMPI

TMS

CPU

User-level thread libraries

Page 16: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

16

Different types of parallelism

• ThreadMan supports different types of parallelism

– Regular parallelism (Data)

– Irregular parallelism

– Repeat parallelism (loop)

– Nested parallelism (parallelism inside parallelism)

• Usually, ThreadMan is invoked as a main thread that manages the execution of other threads (or) thread managers.

• A child thread manager is invoked by the parent whenever there is a repeat (or) nested parallelism to control.

– The children executes independently.

– The parent ThreadMan waits for its children and other threads it has spawned to complete.

– After execution the control is passed to the parent.

• ThreadMan can manage its children and other threads simultaneously.

CPU

User Processt0…

t1 t3 t4 t8

t3.1. t3.2 t3.n

t2

t7t6

t6.1 t6.2 t6.n

t5

… TM

TM

User

Kernel

Hardw are

User-level thread libraries

Scheduler

CPU

User Processt0…

t1 t3 t4 t8

t3.1. t3.2 t3.n

t2

t7t6

t6.1 t6.2 t6.n

t5

… TM

TM

User

Kernel

Hardw are

User-level thread libraries

Scheduler

Page 17: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

17

Runtime Analysis

Hardware• Dual Processor Sun Ultra Enterprise 3000 with Solaris 2.6• Dual bootable Pentium II machine with Windows NT server 4.0 and

Linux 6.2. • Same compiler used to compile sequential and parallel source code.

Applications• Two applications are was developed in sequential C and PARSA

version 2.0– Merge sort posses irregular and repeat parallelism– Matrix multiplication contains regular parallelism.

• Thread manager manages the runtime execution of parallel version across Solaris, Linux and Windows NT platforms.

Page 18: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

18

Dual Processor Performance

• The speedup data points were calculated by dividing the sequential execution times by the parallel execution times to normalize the performance on each system.

• Scalable applications executes at very nearly the maximum theoretical speedup of 2 on the Solaris and Linux systems with slightly lower performance on the Windows system.

• In Linux, Merge sort speedup slightly exceeds the theoretical speed limits. This is a consistent behavior seen. May be due to caching effects.

Merge Sort

0

1

2

3

16 32 64 128 256 512 1024

List Size (k)

Sp

eed

Up

Solaris

Linux

Window s

Page 19: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

19

Dual Processor Performance (Contd.)

• Some applications by nature doesn’t scale very well. – Based on algorithmic design

of the program.

• These applications may not suitable for multi-threading. So is the case with PARSA.

• NOTE: Non-scalable applications developed using multi-threading may sometimes exhibit poor performance due to overhead incurred on multithreading.

Dhrystone Benchmark

0

0.5

1

1.5

2

500 750 1000 5000 10000 50000

Iterations (K)

Sp

eed

up

Solaris

Page 20: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

20

Reentrant Code generation using PARSA

Page 21: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

21

Reentrant Code

• Reentrant code - Multiple invocations of a code can execute safe when executing concurrently:– Functions, web-based applications and libraries.

• Lack of reentrant behavior limits how multithreaded code can be deployed.– Unreliable results can be generated because multiple invocations

of the code can corrupt each other data during execution.

• Non-reentrant code is unacceptable for many general purpose uses.

Page 22: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

22

Motivation

• Originally, PARSA-generated code was not designed to be non-reentrant.– Investigation revealed problems with multithreaded functions.

• The problem:– The PARSA generated source code shares global data space for passing data .

• Multiple threads spawned during execution pass data in a global space.– This feature efficiently utilizes the shared memory architecture of multithreaded systems (e.g.,

single processor systems and symmetric multiprocessors (SMPs).

• Multithreaded function projects developed in PARSA can generate unexpected (i.e., incorrect) results when multiple invocations of the function are executed concurrently.

– All invocations share the same data space and can produce unreliable results.

• However, application projects developed in PARSA are unaffected because they are executed with their own state at run time.

– A solution was needed to • eliminate the use of global shared space.• create individual local space for each invocation.

Page 23: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

24

Design

• New API functions added to ThreadMan:– parsa_global_new: Dynamically allocate data passing memory.

• Each invocation has its own local memory allocated.– parsa_global_delete: Free allocated data passing memory.– parsa_global_getref: Get the pointer to the invocation’s locally

allocated memory.• Reentrant code generation feature integrated with

source code generator.– The source code generator generates additional code that calls

the new API.• Changes to UI

– Options that enable reentrant code generation for applications.

Page 24: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

25

Runtime Analysis

• Dual Processor Sun Ultra Enterprise 3000.• Solaris version 2.6.• Same compiler used to compile sequential and

parallel source code.• Single and Dual processor analysis were

performed on the same machine.

Page 25: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

26

Single Processor Performance

• Graphs show speedup of parallel software to that of sequential ones on a single processor Solaris machine.

• The parallel code performance is almost equal to the sequential performance on a single processor machine.

• This shows thread manager has little overhead on managing threads.

Matrix Multiplication

0

0.5

1

1.5

2

400 500 600 700 800 900 1000 1100 1300 1500

Input Matrix Size (n)

Sp

ee

d U

p

Single Processor

Merge Sort

0

0.5

1

1.5

2

16 32 64 128 256 512 1024

List Size (1000 of elements)

Spe

ed U

p Single Processor

Page 26: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

27

Dual Processor Performance

• Graphs show speedup of parallel software to that of sequential one on a two processor Solaris machine.

• The parallel code performance is almost equal to twice the sequential performance for higher loads.

• This shows thread manager scales as the number of processors grows.

Matrix Multiplication

0

0.5

1

1.5

2

400 500 600 700 800 900 1000 1100 1300 1500

Input Matrix Size (n)

Sp

ee

d U

p

Dual Processor

Merge Sort

0

0.5

1

1.5

2

16 32 64 128 256 512 1024

List Size (1000 of elements)

Spe

ed U

p

Dual processor

Page 27: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

28

Papers

• Published a paper titled "A Tool Based Methodology for Development of Automatically Scalable and Reusable Parallel Code" for "The Tenth IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems", Fort Worth, Texas, October 2002.

• Published a paper titled " Creating Portable and Automatically Scalable Parallel Software Using the PARSA™ Programming Methodology" for "The 5th IEEE International conference on Algorithms and Architectures for Parallel Processing", Beijing, China, October 2002.

• Presented a paper titled "Performance Analysis and Scalability of the ThreadMan(TM) Thread Manager“ for "The 2002 International Conference on Parallel and Distributed Processing Techniques and Applications", Las Vegas, Nevada, July 2002.

• ThreadMan API published as White paper by PrismPTI,Inc,2001.

[Downloadable at http://omega.uta.edu/~vxm7387/projects.html]

Page 28: An Enhanced Portable Thread Manager Presentation by Vijay Murthi Supervisor: David Levine Committee Members: Behrooz Shirazi and Bob Weems

29

Conclusion

• The thesis presents ThreadMan–an integral component of PARSA that – eliminates the need for programmers to generate code that controls the

run time execution of their parallel software projects– support development of multi-threaded applications and functions – supports different forms of parallelism.– makes PARSA generated parallel source code portable across a wide

range of platforms.

• Reentrant multi-threaded functions can be developed in PARSA that will safely execute in a wide range of deployment environments:– C functions.– C++ methods.– Web-based applications.