memory restriction, limits and heterogeneous grids. a case study. txema heredia or an example of how...
TRANSCRIPT
![Page 1: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/1.jpg)
Memory restriction, limits and
heterogeneous grids.
A case study.Txema Heredia
Or an example of how to adapt your policies to your needs
![Page 2: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/2.jpg)
DISCLAIMERWhat I am going to present is not either the
panacea nor has to adapt to nor solve immediately your cluster issues. This is just a brief description of the problems we faced and how did we use different SGE’s options
to handle them.Also, no animal was harmed in the making of
this powerpoint.
![Page 3: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/3.jpg)
Our story
![Page 4: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/4.jpg)
“hey, let’s buy a cluster”
- my boss
![Page 5: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/5.jpg)
What did we need?
![Page 6: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/6.jpg)
What did we need?
•Users:
•biologists, not programmers
•Processes:
•user-made scripts
•single core biological software
![Page 7: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/7.jpg)
What did we NOT need?
•Nopes:
•threads / parallel programming (mostly)
•GPUs
•Ayes:
•thousands of single-core jobs
![Page 8: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/8.jpg)
And thus, our baby was born
![Page 9: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/9.jpg)
![Page 10: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/10.jpg)
Our cluster
•8 computing nodes
•8 cores
•8 Gb RAM
•1 front-end
![Page 11: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/11.jpg)
Our cluster
•NFS
•Rocks cluster (CentOS)
•SGE
![Page 12: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/12.jpg)
First steps with SGE
![Page 13: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/13.jpg)
First steps with SGE
•1st try:
•One queue to rule them all
![Page 14: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/14.jpg)
First steps with SGE
•1st try:
•all.q queue
•free for all
![Page 15: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/15.jpg)
First steps with SGE
•1st try - conclusions:
•chaos reigned
•constant conflicts between users (specially time related)
•FIFO queuing
•swapping
![Page 16: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/16.jpg)
2nd try
•2nd try
•round-robin-like scheduling
•share tree/functional tickets
•split cluster by time usage:
•3 queues: fast / medium / slow
![Page 17: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/17.jpg)
2nd try
•fast:
•2 hours / 2 nodes
•medium:
•48 hours / 3 nodes
•slow:
•∞ hours / 3 nodes
![Page 18: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/18.jpg)
2nd try
•Conclusions:
•↓ chaos
•↓ user conflicts
•Still swapping
•High undersubscription of the cluster
![Page 19: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/19.jpg)
2nd try
•3 types of jobs
•Don’t need to coexist at the same time
•1 user → 1 type of job
•User knowledge
•Saturation of the unlimited queue
![Page 20: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/20.jpg)
2nd try•Queue tinkering:
•wallclock time
•number of hosts
•Better results, but not good enough:
•Waiting jobs & idle nodes
![Page 21: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/21.jpg)
2nd try
•There are 2 wars here:
•memory / swap
•splitting leads to undersubscription
![Page 22: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/22.jpg)
The memory war
![Page 23: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/23.jpg)
Memory
•Buy more memory
•from 8x8Gb
•to 4x 32Gb, 3x 16Gb, 1x 8Gb
•This reduces our problem, but doesn’t fix it
![Page 24: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/24.jpg)
Swap
•Swapping in a cluster is the root of all evil
![Page 25: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/25.jpg)
Swap
•Complex attribute “h_vmem”
![Page 26: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/26.jpg)
![Page 27: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/27.jpg)
![Page 28: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/28.jpg)
h_core
h_rt ≠ h_cpu
h_fsize
h_rss
h_stack
h_data = h_vmem
![Page 29: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/29.jpg)
h_vmem
•h_vmem
•SIGKILL
•s_vmem
•SIGXCPU
•You can combine both
![Page 30: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/30.jpg)
h_vmem
•Requestable by default
•We want them to be consumable
•qmon / qconf -mc
![Page 31: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/31.jpg)
h_vmem
![Page 32: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/32.jpg)
h_vmem
•requestable = YES
•consumable = YES / JOB
•default = whatever you want
![Page 33: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/33.jpg)
h_vmem
•Only for parallel environment jobs:
•consumable = YES
•sge_shepherd memory = h_vmem*slots
•consumble = JOB
•sge_shepherd memory = h_vmem
![Page 34: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/34.jpg)
h_vmem
•default = 100M
•“everything” dies
•default = 6G
•“everything” works
![Page 35: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/35.jpg)
h_vmem
•Now we can limit the memory
•But we can still have swapping
![Page 36: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/36.jpg)
h_vmem
•Define h_vmem in each host
•qmon / qconf -me hostname
![Page 37: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/37.jpg)
![Page 38: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/38.jpg)
h_vmem
•Exact memory:
•more secure
•Bigger memory:
•more margin
![Page 39: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/39.jpg)
Memory
•From now on, any job submission must contain a memory request:
•qsub ... -l h_vmem=3G ...
![Page 40: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/40.jpg)
No more swapping!!
![Page 41: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/41.jpg)
Undersubscription
![Page 42: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/42.jpg)
Undersubscription
•Dual restriction:
•8 jobs/slots per node
•32 / 16 / 8 GB mem per node
•The minimum of both will apply
![Page 43: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/43.jpg)
32 Gb node
8 Gb node
![Page 44: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/44.jpg)
32 Gb node
8 Gb node
8Gb8Gb
1Gb1Gb 1Gb1Gb
1Gb1Gb 1Gb1Gb
1Gb1Gb
1Gb1Gb 1Gb1Gb
1Gb1Gb
7 slots free0 Gb free
0 slots free24 Gb free
Stupid scheduling
![Page 45: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/45.jpg)
32 Gb node
8 Gb node
8Gb8Gb
1Gb1Gb 1Gb1Gb
1Gb1Gb 1Gb1Gb
1Gb1Gb
1Gb1Gb 1Gb1Gb
1Gb1Gb
0 slots free0 Gb free
7 slots free24 Gb free
Smart scheduling
![Page 46: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/46.jpg)
Smart scheduling
•We want each job to go to the node where it better fits.
![Page 47: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/47.jpg)
(another) DISCLAIMERThis is strictly for our case and needs. It may
appeal to you, or some ideas can inspire you, but it is not intended to be a step-by-
step solution for everyone.It is just an example of “things that can be
done”.
![Page 48: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/48.jpg)
Smart scheduling
•Create 3 hostgroups:
•@32G, @16G and @8G
•Group nodes by memory
![Page 49: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/49.jpg)
Smart scheduling
•Maximize the ratio memory/core:
•job <1Gb → 8Gb nodes
•1Gb < job < 2Gb → 16Gb nodes
•2Gb < job → 32Gb nodes
![Page 50: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/50.jpg)
Smart scheduling
•3 different queues:
•all-32
•all-16
•all-8
•assign the corresponding hostgroup
![Page 51: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/51.jpg)
Smart scheduling
•Same problem as before:
•Oversubscription of one queue
•Undersubscription of other queues
![Page 52: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/52.jpg)
Sequence Numbers
![Page 53: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/53.jpg)
Smart scheduling
•Preference for a given hostgroup
![Page 54: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/54.jpg)
Smart scheduling
•all-32:
•@32G > @16G > @8G
•all-16:
•@16G > @32G > @8G
•all-8:
•@8G > @16G > @32G
![Page 55: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/55.jpg)
Smart scheduling
•qmon → queue configuration → general configuration → Sequence Nr
•qconf -mq queuename
![Page 56: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/56.jpg)
Smart scheduling@32GSeq Nr=0
@16GSeq Nr=1
@8GSeq Nr=2
all-32 queue
![Page 57: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/57.jpg)
32 Gb queue
16 Gb queue
8Gb queue
Waiting queue
?✗✓
?✗✓
?✗✓
![Page 58: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/58.jpg)
Are we done?
![Page 59: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/59.jpg)
Qsub wrapper
•Users already choose the memory
•Why ask for a queue?
•We can let the system do it
![Page 60: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/60.jpg)
Qsub wrapper
•Wrapper script around qsubparse parameters searching for queue or memory requests
if ( no memory ) { memory = default }
if ( no queues ) {
if (memory < 1Gb) { queue = all-8 }
if (1Gb < memory < 2Gb) { queue = all-16 }
if (2Gb < memory ) { queue = all-32 }
}
qsub -q $queue parameters
![Page 61: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/61.jpg)
Qsub wrapper
•You can add whatever you need
![Page 62: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/62.jpg)
Qsub wrapper
•“home-made” parameters
•--slow / --fast
•allow access to 2 kind of special nodes
•instead of
•-q all-16@compute-1-*
![Page 63: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/63.jpg)
Qsub wrapper
•One queue to rule them all
•but...
•No swap!!!
•No undersubscription!!!
![Page 64: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/64.jpg)
Now the icing
![Page 65: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/65.jpg)
![Page 66: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/66.jpg)
Punishment
•System relies in “good behaviour”
•Teach users how to use it
•Prevent & “punish” bad usage
![Page 67: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/67.jpg)
Punishment
•epilog script
•runs when the job finishes
•global: qconf -mconf
•or by queue
•/opt/gridengine/default/common/sge_epilog.sh
![Page 68: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/68.jpg)
Punishment
•Check memory
•requested
•maxvmem
•log
•send an email
![Page 69: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/69.jpg)
Punishment•no memory
•teaches how to request it properly
•too much memory
•tells and advises.
•reasonable memory
•no email
![Page 70: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/70.jpg)
Punishment
•epilog writes a logfile
•cron process “punishes” or “rewards” users according to last day memory usage
![Page 71: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/71.jpg)
Punishment
•Modify user’s shared ticket policy
•For each “bad” job:
•-10 tickets
•For each “good” job:
•+5 tickets
![Page 72: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/72.jpg)
Punishment
•“bad users”
•delayed scheduling
•“good users”
•more priority
![Page 73: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/73.jpg)
Control other resources
![Page 74: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/74.jpg)
Shared disk
•NFS shared disk
•avoid filling it
•suspend all jobs before its too late
![Page 75: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/75.jpg)
Shared disk
•New complex attribute: scratch_pct
•type = INT
•operation >=
•requestable = NO
•consumable = NO
•default = 0
![Page 76: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/76.jpg)
Shared disk
•Load Report
• /opt/gridengine/default/common/sge_load_report.sh
![Page 77: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/77.jpg)
Shared diskinfinite loop {
scratch=`df| grep scratch| awk '{print $4}' | grep % | sed 's/%//g'`
echo begin
echo "$myhost:scratch_pct:$scratch"
echo end
}
![Page 78: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/78.jpg)
Shared disk
![Page 79: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/79.jpg)
Shared disk
•whenever the disk gets to 97%
•all jobs freeze
![Page 80: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/80.jpg)
Conclusions
•Combining SGE options give access to much more powerful configurations
![Page 81: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs](https://reader030.vdocument.in/reader030/viewer/2022032705/56649dd15503460f94ac6fb3/html5/thumbnails/81.jpg)
Questions?Special thanks:
•Angel Carreño
•Carles Perarnau
•Marc Esteve
•Jordi Rambla
•Arcadi Navarro