a job completion plugin for elasticsearch - hpckp. · pdf file1. introduction 2. elasticsearch...
TRANSCRIPT
1. Introduction
2. ElasticSearch
3. MareNostrumIII solution
4. Plugin goals
5. Plugin development
6. Production Integration
7. Future work
8. References and conclusions
Index
• BSC-CNS (Barcelona Supercomputing Center)
• Officially constituted in April 2005
• Variety of clusters: • MareNostrumIII
• 48,896 cores (3,056 compute nodes), 103.5TB of main memory
• IBM Plaftorm LSF
• MinoTauro (GPU based), CNAG (genomics), BSCCV (life), ...
• Different sizes/configurations
• SLURM
• Research, develop and manage IT in order to ease scientific
progress
• Special dedication in some areas: • Computer Sciences, Life Sciences, Earth Sciences and Computational
Applications
Introduction and motivation
• Who make use of the clusters?
• How we divide cpu hours among projects?
Sharing depends on the cluster
Introduction and motivation
MN3
PRACE RES BSC
70% 24% 6% (Partnership for Advanced
Computing in Europe)
(Red Española de
Supercomputación)
prace class_a class_b class_c
… bsc_cs
bsc_ls
bsc_es
…
cluster
projects
queues
• We need to ensure that the cpu usage distribution among
projects meets the agreement
• Analyzing data about finished jobs gives us a very valuable
information o Correlations between users time_limit and elapsed time
o Statistical information about projects, groups, users and
how do their executions finish
• Use the results for
o Make corrections to the scheduling configuration
o Train users on how to properly submit jobs
o Accounting purposes
There is a NEED to store historical data about finished jobs
Introduction and motivation
“Elasticsearch is a flexible and powerful open source, distributed,
real-time search and analytics engine.”
Features:
• Real-time data
• Distributed
• High-availability
• Document oriented (JSON)
• RESTful API
• Schema free
• Based on Apache Lucene
www.elasticsearch.org
ElasticSearch basics
Structure:
• Cluster: “collection of one or more nodes (servers) that
together holds your entire data”
• Node: “single server that is part of your cluster”
• Index: “collection of documents that have somewhat similar
characteristics”
• Type: “whithin and index, you can define one or more types
(logical category/partition)”
• Document: “basic unit of information that can be indexed,
expressed in JSON format”
• Shard: “subdivision of an index”
ElasticSearch basics
MareNostrumIII solution
Scheduling Server with
LSF
mbatchd
event
lsb.acct
events_log
inotify
netcat
pipe
Network
tcp
logstash
tcp Monitoring Server
ElasticSearch
index
Kibana
uses
httpd
access
web
browser
request presents job
historical data
• Rest of BSC clusters use SLURM
• Make it generic, following the SLURM guidelines
• Current jobcomp plugins didn’t satisfy our needs mysql, filetxt or script elasticsearch
Plugin goals
slurmctld
ElasticSearch
index job data
Finished job data, 37 fields:
account group_id state
alloc_node groupname std_err
cluster jobid std_in
cpu_hours nodes std_out
cpus_per_task ntasks submit_time
derived_exitcode ntasks_per_node time_limit
elapsed orig_dependency total_cpus
eligible_time parent_accounts total_nodes
end_time partition user_id
excluded_nodes qos username
exitcode reservation_name work_dir
gres_alloc script
gres_req start_time
• Operations against elasticsearch server are executed through
HTTP requests/responses
• Request pattern: $ curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>' -d '<BODY>'
• Request to index a document, example: $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
• Plugin uses libcurl-devel library to handle requests/responses
autoconf files have been added
plugin not installed unless library detected and usable
Plugin development
• Plugin can be enabled and configured in slurm.conf JobCompType=jobcomp/elasticsearch
JobCompLoc=http://YOURELASTICSERVER:9200
• So plugin has to check that server referenced by the
configured URL is reachable and accessible
• How? by capturing and parsing the HTTP response headers
received from the server side
Plugin development
• Example of a properly indexed document’s response:
HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 92
{"_index":"someindex","_type":"sometype","_id":
"fsAx6qXcQGCSrY1DWvQACw","_version":1,"created"
:true}
• Just the header is needed (not the body). So libcurl
parameters are configured to just capture the headers
• Specifically, plugin checks whether the status code is 200
(OK) or 201 (Created)
Plugin development
• Different sources of failure: server unavailable, index read-
only, etc.
• Example of a document not indexed:
HTTP/1.1 403 Forbidden
Content-Type: application/json; charset=UTF-8
Content-Length: 96
{"error":"ClusterBlockException[blocked by: [FORBIDDEN/5/index read-only
(api)];]","status":403}
• Does it mean that every status code different to 200 or 201
indicates a failure? … NO (corner case found while testing)
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Cotent-Type: application/json
Plugin development
Used to determine if the origin server
is willing to accept the request (based
on the headers) before the client
sends the body
• What happens with job data that can’t be indexed?
Plugin manages a memory structure to take into account the
data information about pending jobs
typedef struct {
uint32_t nelems;
char **jobs;
} pending_jobs_t
StateSaveLocation/elasticsearch_state
Data saved in network byte order, using SLURM functions: pack_str_array()
safe_unpackstr_array()
Plugin development
job0
data
job1
data
job2
data
job3
data
job4
data
… jobN-1
data
Data coherence
between memory
structure and
state file
• When does the plugin try to reindex the pending jobs?
1. When the plugin is loaded
2. Just after a successfully indexed job
Plugin development
job0
data
job1
data
…
jobN-1
data
pending_jobs_t
_load_pending_jobs()
elasticsearch_state
ElasticSearch
_index_retry()
_index_retry()
• A web-layer has been added (Kibana)
Production integration
Powerful search
syntax and easy-setup
Configurable dashboards,
time-based comparisons
Make sense of your data,
create bar, line and scatter
plots
Flexible interface,
easy to share
• Plugin already running in MinoTauro cluster
• 126 compute nodes, GPU based
• 2 login nodes
• Planned integration in CNAG cluster in future months
• Genomics analysis and research
• 100 compute nodes, 20 HiMem nodes
• 2 login nodes
• Same with the rest of BSC SLURM clusters
• BSCCV, Altix2 UV100, etc.
Production integration
• Kibana global view
Production integration
Production integration
• Zoom in/out time range
• Expand job data details
• Search, filter, pagination…
Future work (basic statistics)
• Elapsed time vs
project/qos
• Mins, Maxs,
Means, Std-devs, …
• Simple prediction methdos (Linear Regression) • time_limit prediction based on submit parameters
𝑌𝑡 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + … + 𝛽𝑝𝑋𝑝 + 𝜀
𝑌𝑡 measured or dependant variable
𝑋𝑖 input or independent variables
𝛽𝑖 regression coefficients
𝜀 error term
Helps improving Backfill scheduling (more efficient usage of
cluster resources)
A submit plugin could be developed applying the prediction
formula
There are more complex models, using decision trees or
combining different models into one
Future work (Machine Learning)
Future work (Machine Learning)
• SLURM reference to the plugin
http://slurm.schedmd.com/download.html
• Github repository
https://github.com/asanchez1987/jobcomp-elasticsearch
• Possible merge in future stable releases
• Final Master Thesis, university-company context
o Bacelona School of Informatics, www.fib.upc.edu/en
o Barcelona Supercomputing Center, www.bsc.es
References and conclusions