some design notes iteration - 2 method - 1 extractor main program runs from an external vm listens...
TRANSCRIPT
Some Design Notes Iteration - 2
• Method - 1• Extractor main program
• Runs from an external VM• Listens for RabbitMQ messages• Starts a light database engine - SQlite - additional overhead?• Creates an HPC job per file and sends ACK• Store the job ID and file ID in a database table• Asynchronously check for job status and once a job completes with exit 0, send ack for the corresponding file id - Can this be done?• Another option is to acknowledge immediately and if the job the fails resend the message - Can this be done?
• Extractor processing logic• Resides in the HPC• Is called by the extractor main program• Downloads file using Clowder APIs• Processes the files and uploads back previews and metadata using APIs
• Method - 1A• All above steps are included• Staging in and staging out• SFTP the files from main extractor to HPC file system• Helps in optimizing HPC time usage
• Method - 2• An elasticity control script listens to RabbitMQ messages• Once the number of messages increase, it creates multiple instances of a special extractor• This is more of a manual approach
Start
Wait for RabbitMQ message
SSH into Login node
Run PBS script to submit Job to HPC queue
Job Status?
Queued / Running
Send ACK to RabbitMQ
End
Failed
Process File
Use HPC?
Yes
No
pyClowder HPC Flowchart (Iteration 2)
Create PBS Script from HPC XML
file and Config file
Database
Store HPC job ID, file ID, and status
Readsettings
fromConfig file
Completed with Exit Status 0
Get record
Connect to RabbitMQ
Messagereceived?
Yes
No
Upload Preview / Metadata
Update records
Wait in HPC Queue
Job Picked up by HPC?
Process File
Upload Preview / Metadata
Yes
No
Synchronous steps
Asynchronous steps
Inside HPC Environment
Exit from Login node
Data/Metadata(MongoDB)
Extraction Bus (RabbitMQ)
ClowderWeb Application
Web Browser
Client
Clowder VM
Data/Metadata(MongoDB)
Main Extractor
Extractor VM
Job
#!/usr/bin/env pythonimport pikaimport sysimport loggingimport jsonimport traceback. . .def main():
GCN-51
Job
#!/usr/bin/env pythonimport pikaimport sysimport loggingimport jsonimport traceback. . .def main():
GCN-65
Job
#!/usr/bin/env pythonimport pikaimport sysimport loggingimport jsonimport traceback. . .def main():
GCN-40
Clowder APIs
HPC Compute Nodes
pyClowder
HPC XML File
pyClowder HPC Architecture Diagram (Iteration 2)
Some Design / Implementation Questions (from JIRA)Iteration - 1
• Does the extractor program files need to be copied to the login node through program? If code compilation is needed this might created additional overhead.
– Assume that the program is present in the HPC environment in the compiled format
• Another option is to assume that the extractor will run from within the HPC environment. I.e. the code is already present in the HPC. Is this a safe assumption to make?
– Yes
• What are the exceptions that need to be handled?– Exception in main extractor– Exception in extraction job in HPC– Job aborts due to reasons at HPC side - requested wall-time or memory exceeded– The VM from where the main extractor is running crashes?
• What is expected out of the user who sets up the HPC extractor? Or what shall be provided by pyClowder and what shall be done by the one who writes the extractor?
– Try to make this as generic as possible.
• Can extractor logic be put in a separate file? Otherwise, how will the HPC machine pick up the job file?
– Need to find a workaround. Need to keep the extractor structure unchanged.
Start
Get RabbitMQ message
Use HPC?
SSH into Login node
Transfer Extractor Program to Login
Node via SFTP
Submit Job to HPC queue
Job Status?
Yes
Queued / Running
Send ACK to RabbitMQ
Completed with Exit Status 0
End
FailedProcess File
No
Extraction Successful?
YesNo
Flowchart (Iteration 1)
Data/Metadata(MongoDB)
Extraction Bus (RabbitMQ)
ClowderWeb Application
Web Browser
Client
Clowder VM
Data/Metadata(MongoDB)
Main Extractor
Extractor VM
Job
#!/usr/bin/env pythonimport pikaimport sysimport loggingimport jsonimport traceback. . .def main():
GCN-51
Job
#!/usr/bin/env pythonimport pikaimport sysimport loggingimport jsonimport traceback. . .def main():
GCN-65
Job
#!/usr/bin/env pythonimport pikaimport sysimport loggingimport jsonimport traceback. . .def main():
GCN-40
Clowder APIs
HPC Compute Nodes
pyClowder
HPC XML File
Architecture Diagram (Iteration 1)