deck sig monitor condor - university of wisconsin–madison · a marriage of open source and custom...
TRANSCRIPT
![Page 1: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/1.jpg)
MonitoringHTCondorAMarriageofOpenSourceandCustom
WilliamDeck
![Page 2: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/2.jpg)
Goals
• Helpthoselookingtoaugment/rollouttheirownclusterMonitoring• Possiblyadifferentperspectiveonmonitoring
•Getsomeofthisworkoutintheopen
![Page 3: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/3.jpg)
Who am I?• Originally a mainframe developer• Moved to SIG January of 2015• First task: Monitor the “cluster”
![Page 4: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/4.jpg)
OurCluster
• MixedCluster(Several)• ~1700cores
• 1100Windows• 600Linux(SLES11/SLES12)
• Storage• Lustre:2.8PB• GPFS:8.4PB
![Page 5: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/5.jpg)
OurWorkload
• ~100Kjobs/week• MixedOSWorkloads(MajorityWindows)• DailyWorkflows(1– 10K)• OneOffWorkflows(+100K)
![Page 6: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/6.jpg)
WhatistheProblem?• Manymovingparts• Multiplegroupsusingcondordifferently
• Singlesubmitfiles• DAGs• Someonefoundoutaboutcondor_run
• Monitor1– 100kjobs• Hardforuserstoself-support• Correlatejobfailurestoinfrastructureproblems
![Page 7: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/7.jpg)
MonitorEvolution
• HTCondor’sJobMonitor• CycleComputingSolution• CustomScriptsparsingcondor_*commands• PythonbindingswithElastic,Grafana,andConmon
![Page 8: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/8.jpg)
FirstSolution– TalesofCondor
• ModelledafterCycleComputingmonitoring• Parsingcondor_status,condor_historyoutput• Putinformationintocsvfiles• Simplewebpagedisplayed
• Totaljobsvsrunningjobs• Totaljobs/Uservsrunningjobs/User
![Page 9: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/9.jpg)
FirstSolution:Revolutionarycommandgrep
• Foreverythingelseweusedgrep• Painfulandtedious• Goodnews!
• Reallygoodatgrep
![Page 10: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/10.jpg)
Enter:Monmon
• Customwebpagedesignedaroundourworkflowcreationscripts• Parsesnodestatus/HTCondorlogfiles• Specificforonegroup’sworkflows• Usefulfortheusertodrilldownandsharewithothers
![Page 11: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/11.jpg)
EnterElastic,Grafana,andConmon
• Elastic(ELK)(https://www.elastic.co/)
– Elasticsearch- Search/analyticsengine
– Logstash- Ingestofdata
– Kibana- Frontend
• Grafana- (https://grafana.com/)
– extensionofKibana(3.0)
– Multiplebackends(graphite,ES,influxdb...)
![Page 12: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/12.jpg)
EnterElastic,Grafana,andConmon
• Pollperiodicallyusingpythonbindings• InsertintoES
• AugmentedJob/HostClassAd• CustomsubsetofClassAd
• UseGrafanafrontend• TalesofCondor2.0• MoreComplexDashboards• ExtendedMonmon->Conmon
![Page 13: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/13.jpg)
ESJobAdExample
● Searchable Job ClassAd● Common Uses
○ condor_history○ Used Resources
![Page 14: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/14.jpg)
Grafana:OneStopShop
![Page 15: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/15.jpg)
GrafanaDashboards
• EasilycreatedashboardsfromESandperformancemetrics• SinglePaneofGlassforCondorandInfrastructure
![Page 16: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/16.jpg)
TalesofCondor2.0
![Page 17: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/17.jpg)
OverallClusterHealth
![Page 18: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/18.jpg)
![Page 19: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/19.jpg)
Grafana:ErrorRates
• Findblackholeexec• Postscriptvs.ExitCode• Errorsby:
• User• DAG• IsitUserorinfrastructure
![Page 20: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/20.jpg)
Conmon
• UsesFlask• ESQueries• Displayjobclassad• Workflowoverview• GridView• DAG/JobLogs• WorkflowAnalysis
![Page 21: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/21.jpg)
Conmon:Home(DAGs)
![Page 22: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/22.jpg)
Conmon:Workflow(DAGs)
![Page 23: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/23.jpg)
Conmon:Grid(DAGs)
![Page 24: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/24.jpg)
![Page 25: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/25.jpg)
WorkflowAnalysis
• InformationfromES• Easytoseelonglegs
![Page 26: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/26.jpg)
IndividualJobs
![Page 27: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/27.jpg)
Conmon:Benefits• LoadingCondorinformationfromES• Canhandlemultiplesubmission/workflowstypes(DAGs,submit)• Usercanclickthroughjobs• Search/Filter• Shareable• Consistentviews
![Page 28: deck sig monitor condor - University of Wisconsin–Madison · A Marriage of Open Source and Custom William Deck. Goals ... First Solution –Tales of Condor •Modelled after CycleComputing](https://reader034.vdocument.in/reader034/viewer/2022042315/5f036f7d7e708231d40930bf/html5/thumbnails/28.jpg)
Questions?