Cluster Architecture
Cluster setup and the resources available. General information about the cluster and differences between, Legacy/eRI/NeSI can be found at Differences
On the custer, the sinfo
command can show the current state of each node.
S:C:T means, sockets, cores and threads and CPUS = S*C*T and gvies the number of possible tasks for that node.
The interactive
and vgpu
nodes are in the Open on Demand/Jupyter space and will eventually not be visible as a resource on the cluster, but we will use them for the Demo.
login-0 ~ $ sinfo PARTITION AVAIL JOB_SIZE TIMELIMIT CPUS S:C:T NODES STATE NODELIST compute* up 1-infini 14-00:00:0 256 2:64:2 2 mixed compute-[0,5] compute* up 1-infini 14-00:00:0 256 2:64:2 1 allocated compute-2 compute* up 1-infini 14-00:00:0 256 2:64:2 3 idle compute-[1,3-4] gpu up 1-infini 14-00:00:0 96 2:24:2 1 idle gpu-0 hugemem up 1-infini 14-00:00:0 256 2:64:2 1 mixed hugemem-0 hugemem up 1-infini 14-00:00:0 256 2:64:2 1 idle hugemem-1 interacti up 1-infini 60-00:00:0 8 8:1:1 1 mixed interactive-0 interacti up 1-infini 60-00:00:0 8 8:1:1 2 idle interactive-[1-2] vgpu up 1-infini 60-00:00:0 32 32:1:1 4 idle vgpu-[0-3]
More information about resources and the state of each node can be gained from the command:
sinfo -N --Format=nodelist,cpusload,cpusState,FreeMem,AllocMem,Memory
Or to see what’s running use squeue
or again for more detail you can use something like:
squeue --format="%10i %20j %20a %15u %4T %.10M %.10L %R %.C %.m %w"
login-0 ~ $ sinfo -N --Format=nodelist,cpusload,cpusState,FreeMem,AllocMem,Memory NODELIST CPU_LOAD CPUS(A/I/O/T) FREE_MEM ALLOCMEM MEMORY compute-0 0.29 18/238/0/256 978099 146072 980163 compute-1 0.02 0/256/0/256 1016984 0 980163 compute-2 57.25 256/0/0/256 869759 262144 980163 compute-3 0.00 0/256/0/256 1016450 0 980163 compute-4 0.00 0/256/0/256 1015908 0 980163 compute-5 58.02 96/160/0/256 947861 98304 980163 gpu-0 0.08 0/96/0/96 505645 0 489916 hugemem-0 40.21 40/216/0/256 4022021 307200 3910416 hugemem-1 0.02 0/256/0/256 4100992 0 3910416 interactive-0 0.01 2/6/0/8 10838 8620 15217 interactive-1 0.00 0/8/0/8 12178 0 15217 interactive-2 0.04 0/8/0/8 12116 0 15217 vgpu-0 0.00 0/32/0/32 446262 0 428894 vgpu-1 0.00 0/32/0/32 446914 0 428894 vgpu-2 0.00 0/32/0/32 446902 0 428894 vgpu-3 0.03 0/32/0/32 446940 0 428894
Limits
Job Length - 2 Weeks
MEM - 2.5GB/cpu