Hyperthreading
As CPU technology advanced engineers realised that adapting CPU architecture to include logical processors within the physical core (conventionally, a CPU) allows some computation to occur simultaneously. The name for this technology is simultaneous multithreading, and Intel's implementation of it is called Hyperthreading.
CPUs capable of Hyperthreading consists of two logical processors per physical core. The logical processors can operate on data/instruction threads simultaneously, meaning the physical core can perform two operations concurrently. In other words, the difference between logical and physical cores is that logical cores are not full stand-alone CPUs, and share some hardware with nearby logical cores. Physical cores are made up of two logical cores.
Hyperthreading is enabled by default on NeSI machines, meaning, by default, Slurm will allocate two threads to each physical core.
Hyperthreading with Slurm¶
When Slurm request a CPU, it is requesting logical cores, which, as
mentioned above, there are two of per physical core. If you use
--ntasks=n
to request CPUs, Slurm will start n
MPI tasks which are
each assigned to one physical core. Since Slurm "sees" logical cores,
once your job starts you will have twice the number of CPUs as ntasks
.
If you set --cpus-per-task=n
, Slurm will request n
logical CPUs per
task, i.e., will set n
threads for the job. Your code must be capable
of running Hyperthreaded (for example using
OpenMP)
if --cpus-per-task > 1
.
Setting --hint=nomultithread
with srun
or sbatch
causes Slurm to
allocate only one thread from each core to this job". This will allocate
CPUs according to the following image:
Node name |
wbn009 |
|||||||||||||||
Physical Core id |
0 |
1 |
2 |
3 |
0 |
1 |
2 |
3 |
||||||||
Logical CPU id |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
Number of Allocated CPUs |
4 |
4 |
||||||||||||||
Allocated CPU ids |
0 2 4 6 |
8 10 12 14 |
Image adapted from Slurm's documentation page.
When to use Hyperthreading¶
Hyperthreading increases the efficiency of some jobs, but the fact that
Slurm is counting in logical CPUs makes aspects of running
non-Hyperthreaded jobs confusing, even when Hyperthreading is turned off
in the job with --hint=nomultithread
. To determine if the code you are
running is capable of running Hyperthreaded, visit the manual pages for
the software.
Alternatively, it is possible to perform an ad-hoc test to determine if
your code is capable of making use of Hyperthreading. First run a job
that has requested 2 threads per physical core as described above. Then,
use the nn_seff
command to check the jobs CPU efficiency. If CPU
efficiency is greater than 100%, then your code is making use of
Hyperthreading, and gaining performance from it. If your job gives an
error or stays at 100% efficiency, it is likely you can not run your
code Hyperthreaded. 200% CPU efficiency would be the maximally efficient
job, however, this is rarely observed and anything over 100% should be
considered a bonus.
How to use Hyperthreading¶
- Non-hyperthreaded jobs which use
--mem-per-cpu
requests should halve their memory requests as those are based on memory per logical CPU, not per the number of threads or tasks. For non-MPI jobs, or for MPI jobs that request the same number of tasks on every node, we recommend to specify--mem
(i.e. memory per node) instead. See How to request memory (RAM) for more information. - Non-MPI jobs which specify
--cpus-per-task
and use srun should also set--ntasks=1
, otherwise the program will be run twice in parallel, halving the efficiency of the job.
The precise rules about when Hyperthreading applies are as follows:
Mahuika | Māui | |
---|---|---|
Jobs | Never share physical cores | |
MPI tasks within the same job | Never share physical cores | Share physical cores by default. You can override this behaviour by using --hint=nomultithread in your job submission script. |
Threads within the same task | Share physical cores by default. You can override this behaviour by using--hint=nomultithread in your job submission script. |
How many logical CPUs will my job use or be charged for?¶
The possible job configurations and their results are shown in the following table. We have also included some recommendations to help you make the best choices, depending on the needs of your workflow.
Job configuration | Mahuika | Māui |
---|---|---|
|
The job gets,
and is charged for, two logical CPUs. --hint=nomultithread is irrelevant. |
The job
gets one logical CPU, but is charged for 80.--hint=nomultithread is irrelevant.
This configuration is extremely uneconomical on Māui.
Consider using Mahuika or the Māui ancillary nodes
instead. |
|
The job gets, and is charged for, N logical CPUs, rounded up to the nearest even number. Set N to an even number if possible. | The job gets N logical CPUs, but is charged for 80. Set N to 80 if possible. |
|
The job gets, and is charged for, 2N logical CPUs. | The job gets 2N logical CPUs, but is charged for 80. Set N to 40 if possible. |
|
Each task gets two logical CPUs. The job is
charged for two logical CPUs per task. --hint=nomultithread is irrelevant. |
Each task gets one logical CPU. The job is charged for 80 logical CPUs per allocated node. If possible, set the number of tasks per node to 80. |
|
Each task gets two logical CPUs. The job is charged for 80 logical CPUs per allocated node. If possible, set the number of tasks per node to 40. | |
|
Each task gets N logical CPUs, rounded up to the nearest even number. The job is charged for that number of logical CPUs per task. Set N to an even number if possible. | Each task gets N logical CPUs. The job is charged for 80 logical CPUs per allocated node. If possible, set N and the number of tasks per node such that N × (tasks per node) = 80. |
|
Each task gets 2N logical CPUs. The job is charged for 2N logical CPUs per task. | Each task gets 2N logical CPUs. The job is charged for 80 logical CPUs per allocated node. If possible, set N and the number of tasks per node such that N × (tasks per node) = 40. |