xtreemos application execution management: a scalable approach ramon nou, jacobo giralt, julita...

Post on 02-Jan-2016

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes

Barcelona Supercomputing Center (BSC – CNS)

XtreemOS  is funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.

Outline• XtreemOS Overview• Application Execution Manager• Job Execution Flow• Monitoring• Performance and scalability• Job Execution• Job Status

• Future

XtreemOS overview• What is?• A Linux-based operating system to support Virtual

Organizations for Grid.

• Several layers

XtreemOS overview• Some key features:• The Grid easy to use (like a Linux)• Highly scalable.• Fault Tolerant.• Able to run interactive jobs.• Extensible

• 3 nodes types (can be replicated):• Core• Resource• Client

Application Execution Manager• Job management, Monitoring and resource management.• Access Point to submit and control jobs.• Distributed and asynchronous.• Extensible• Linux concepts in Grid world:• Process-Thread paradigm.• Signals.

Application Execution Manager• Several distributed services:

• Job Manager.• Execution Manager.• Reservation Manager.• …

• Semantics:• JobUnit• Set of processes of a Job running in a resource.

• Job• Set of JobUnits. Identified by a JobID. [Process-Thread]

Job Execution Flow

XOSD JobMng

User

XOSD ExecMng

JobDirectory

RSS

Any XOSD

Kernel

JID = createJob(JSDL)

JID

runJob(JID)

getResources(JSDL)

Schedules & Executes process

Job finished (all processes finished)

Monitoring• System metrics.• User defined metrics.• Different levels of information.• Buffering.

• Each service mantains its monitoring information (SCOPE).• ExecMng has information about processes.• JobMng has information about jobs.• ResMng has information about resources.

Performance & scalability• Key points:• Collaboration with Linux Kernel.• No central storage. (DHT’s)• Can be replicated.• Don’t search for best global scheduling, only for a good

enough local scheduling.

• What is the performance without DHT’s?• Typical VO, small (100 nodes) local grid.

Job Execution• O(X2):• Need resource

management for each submitted process.

• All processes are from the same job. (in other systems they would be independent jobs)

Job status• Ask all processes information of the job with low overhead. • Look job finished status in 0.012 seconds (0.014 in GT5) without contacting ExecMng’s

Future improvements• Reduced internal communication times.• Caching to reduce overhead.

• Some conclusions:• Kernel Collaboration with «middleware» is important.• DHT’s (not evaluated) are a good option to distribute data.• But still no high performance.

• Including the concept 1 Job-> n Process gives the user a lot of benefits.• Easy to understand, easy to manage.

XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes

Barcelona Supercomputing Center (BSC – CNS)

XtreemOS  is funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.

top related