some design notes iteration - 2 method - 1 extractor main program runs from an external vm listens...

Some Design Notes Iteration - 2

• Method - 1• Extractor main program

• Runs from an external VM• Listens for RabbitMQ messages• Starts a light database engine - SQlite - additional overhead?• Creates an HPC job per file and sends ACK• Store the job ID and file ID in a database table• Asynchronously check for job status and once a job completes with exit 0, send ack for the corresponding file id - Can this be done?• Another option is to acknowledge immediately and if the job the fails resend the message - Can this be done?

• Extractor processing logic• Resides in the HPC• Is called by the extractor main program• Downloads file using Clowder APIs• Processes the files and uploads back previews and metadata using APIs

• Method - 1A• All above steps are included• Staging in and staging out• SFTP the files from main extractor to HPC file system• Helps in optimizing HPC time usage

• Method - 2• An elasticity control script listens to RabbitMQ messages• Once the number of messages increase, it creates multiple instances of a special extractor• This is more of a manual approach

Start

Wait for RabbitMQ message

SSH into Login node

Run PBS script to submit Job to HPC queue

Job Status?

Queued / Running

Send ACK to RabbitMQ

End

Failed

Process File

Use HPC?

Yes

No

pyClowder HPC Flowchart (Iteration 2)

Create PBS Script from HPC XML

file and Config file

Database

Store HPC job ID, file ID, and status

Readsettings

fromConfig file

Completed with Exit Status 0

Get record

Connect to RabbitMQ

Messagereceived?

Yes

No

Upload Preview / Metadata

Update records

Wait in HPC Queue

Job Picked up by HPC?

Process File

Upload Preview / Metadata

Yes

No

Synchronous steps

Asynchronous steps

Inside HPC Environment

Exit from Login node

Data/Metadata(MongoDB)

Extraction Bus (RabbitMQ)

ClowderWeb Application

Web Browser

Client

Clowder VM


Main Extractor

Extractor VM

Job

#!/usr/bin/env pythonimport pikaimport sysimport loggingimport jsonimport traceback. . .def main():

GCN-51

Job


GCN-65

Job


GCN-40

Clowder APIs

HPC Compute Nodes

pyClowder

HPC XML File

pyClowder HPC Architecture Diagram (Iteration 2)

Some Design / Implementation Questions (from JIRA)Iteration - 1

• Does the extractor program files need to be copied to the login node through program? If code compilation is needed this might created additional overhead.

– Assume that the program is present in the HPC environment in the compiled format

• Another option is to assume that the extractor will run from within the HPC environment. I.e. the code is already present in the HPC. Is this a safe assumption to make?

– Yes

• What are the exceptions that need to be handled?– Exception in main extractor– Exception in extraction job in HPC– Job aborts due to reasons at HPC side - requested wall-time or memory exceeded– The VM from where the main extractor is running crashes?

• What is expected out of the user who sets up the HPC extractor? Or what shall be provided by pyClowder and what shall be done by the one who writes the extractor?

– Try to make this as generic as possible.

• Can extractor logic be put in a separate file? Otherwise, how will the HPC machine pick up the job file?

– Need to find a workaround. Need to keep the extractor structure unchanged.

Start

Get RabbitMQ message

Use HPC?

SSH into Login node

Transfer Extractor Program to Login

Node via SFTP

Submit Job to HPC queue

Job Status?

Yes

Queued / Running

Send ACK to RabbitMQ

Completed with Exit Status 0

End

FailedProcess File

No

Extraction Successful?

YesNo

Flowchart (Iteration 1)

Sandeep Puthanveetil Satheesan

Is this needed?Or can the user run the extractor only from within the HPC?


Extraction Bus (RabbitMQ)

ClowderWeb Application

Web Browser

Client

Clowder VM


Main Extractor

Extractor VM

Job


GCN-51

Job


GCN-65

Job


GCN-40

Clowder APIs

HPC Compute Nodes

pyClowder

HPC XML File

Architecture Diagram (Iteration 1)

some design notes iteration - 2 method - 1 extractor main program runs from an external vm listens...

Documents