SLURM: Best Practice
Bash Header¶
We recommend using #!/bin/bash -e
instead of plain #!/bin/bash
, so
that the failure of any command within the script will cause your job to
stop immediately rather than attempting to continue on with an
unexpected environment or erroneous intermediate data. It also ensures
that your failed jobs show a status of FAILED in sacct
output.
Resources¶
Don't request more resources (CPUs, memory, GPUs) than you will need. In
addition to using your core hours faster, resources intensive jobs will
take longer to queue. Use the information provided at the completion of
your job (e.g: via the sacct
command) to better define resource
requirements.
Wall-time¶
Long jobs will spend more time in the queue, as there are more opportunities for the scheduler to find a time slot to run shorter jobs. So consider using job check-pointing or, where possible, more parallelism, to get job duration down to a few hours, or at worst, days.
Leave some headroom for safety and run-to-run variability on the system but try to be as accurate as possible.
If you have many jobs of less than 5 minutes then they should probably be combined into larger jobs using a simple loop in the batch script so as to amortise the overheads of each job (starting, accounting etc).
Memory (RAM)¶
If you request more memory (RAM) than you need for your job, it will wait longer in the queue and will be more expensive when it runs. On the other hand, if you don't request enough memory, the job may be killed for attempting to exceed its allocated memory limits.
We recommend that you request a little more RAM, but not much more, than your program will need at peak memory usage.
We also recommend using --mem
instead of --mem-per-cpu
in most
cases. There are a few kinds of jobs for which --mem-per-cpu
is more
suitable. See our article on how to request memory
for more information.
Parallelism¶
In general only MPI jobs should set --ntasks
greater than 1 or use
srun
. If you don't know whether your program supports MPI, it
probably doesn't.
Only multithreaded jobs should set --cpus-per-task
. If you don't know
whether your program supports multithreading, try benchmarking with 2
CPUs and with 4 CPUs and see if there is a 2-fold difference in elapsed
job time.
Job arrays are an efficient mechanism of managing a collection of batch jobs with identical resource requirements. Most Slurm commands can manage job arrays either as individual elements (tasks) or as a single entity (e.g. delete an entire job array in a single command)
Fairshare¶
A low fairshare score will affect your jobs priority in the queue, learn more about how to effectively use your allocation here.
Cross machine submission¶
Jobs can be submitted from one machine to another by using the
--cluster
option. E.g. submitting a job from Māui_Ancil to Māui.
By default the environment (modules and variables) will be inherited
from the submitting shell into the job environment. But the environments
vary between our different machines, including module names, location of
slurm tools, etc., which could cause issues in this inheriting case. We
suggest to use the environment variable SBATCH_EXPORT=NONE
(do NOT us
--export=none
option) in the submitting shell. Therefore we suggest to
submit a job, e.g. to Māui using:
SBATCH_EXPORT=NONE sbatch --cluster=maui job.sl
Please note: Above we only discussed the transition from your submitting
environment to the job environment. The latter is the one your job
script is running in. There is another environment created for your
parallel application (when called srun). There we want to inherit from
the job environment to have PATHs and setting available. Therefore,
avoid setting SBATCH_EXPORT=NONE
in your job script or in .bashrc or
.profile for all cases. The slurm --export=none
option would prevent
inhering environments in both transitions. Another note: Alternatively
you can set SLURM_EXPORT_ENV=ALL
in your job script to enable the
environment forwarding to the srun environment.