design and implementation of a single system image operating system for high performance computing...
TRANSCRIPT
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters
Christine MORIN
PARIS project-team, IRISA/INRIA (Rennes, France)
2
Motivation
Clusters as an alternative to multiprocessor machines for high performance computing
Workloads of scientific applications Independent sequential processes
• Compute intensive, huge memory requirements Parallel applications
• Shared memory (multithreaded applications, OpenMP)
• Message passing (MPI)
• Hybrid applications
3
Some Issues …
No obvious solution to support standard Posix multithreaded applications on clusters Memory distribution
Need of efficient placement and load-balancing strategies to take advantage of all cluster resources Efficient process migration
Scientific applications execution time may be greater than the cluster MTBF High availability and checkpointing
4
Single System Image Operating System
Vision of a single machine (virtual SMP) Same interface as a traditional OS for an
SMP machine Same vision for all applications Efficiency
Properties of a SSI OS Resource distribution transparency Intra- and inter- application resource sharing High availability Scalability
5
Kerrighed SSI OS
Combining high performance, high availability and ease of programming Global resource management
• Processor, memory, disk Integrated resource management Dynamic resource management
• To deal with configuration changes
Extension of the standard OS running on each node
Small clusters < 100 nodes
6
Outline
Global process management Global memory management Conclusion and Perspectives
7
Global Process Management
Global scheduling policy Load balancing
Several policies Configurable modular global scheduler The policy can be changed without
stopping the operating system or the applications
The local scheduler on each node is not modified
8
Architecture of the Global Scheduler
Standard OS Standard OS
Global scheduler Global scheduler
Monitors Monitors
Local Analyzers Local Analyzers
Node 1 Node 2
9
Process Management Mechanisms
Memory Disk Network MemoryDiskNetwork
Process state extraction
Processcreation
Processcheckpt
Processmigration
Global scheduler(Application management)
Process state extraction
Processcreation
Processcheckpt
Processmigration
Global scheduler(Application management)
10
Checkpointing
Common mechanisms for supporting checkpointing protocols for both shared memory and message-passing applications
Efficient checkpoint creation Several memory checkpoints between two disk
checkpoints Disk checkpoints stored on local disks Incremental checkpoints Combination of data replication for efficiency and for high
availability for shared memory applications• Data replication due to data sharing exploited to decrease
the cost of checkpoint creation• Recovery data can be used for the computation until the first
modification
11
Process Migration
Communicating processes can migrate Processes sharing memory Processes communicating with data streams
(sockets, pipes, …)
Efficiency of the process transfer Address space transfered on demand (containers)
Efficiency of the process execution after migration Efficient access to open files (containers) Global management of data streams
12
Global Memory Management
Different services Shared virtual memory Remote paging Cooperative file cache
A unique concept: the container Software object to store and share data cluster wide
(COMA like management) Global management of physical memory
Segments of a process address space, files are associated to containers
13
Integration of Containers in a Standard OS
Host Operating System
Memory Manager
File System
Linker
Linker
Memory
VM Manager
Linker
Disk Manager
Linker
Host Operating System
MemoryManager
File System
Linker
Linker
Memory
VM Manager
Linker
Disk Manager
Linker
Container
Disk Disk
14
Conclusion & Perspectives
A SSI OS for clusters is still missing in 2003 Kerrighed represents a promising approach
• A first prototype based on Linux is available
Current work directions High availability and checkpointing OpenMP on Kerrighed Experimentation with industrial applications
• EDF, DGA Grid-aware OS for a federation of clusters
15
http://www.kerrighed.org
Kerrighed has been filed as a community trademark.