Cluster Architecture

Cluster setup and the resources available. General information about the cluster and differences between, Legacy/eRI/NeSI can be found at Differences

On the custer, the sinfo command can show the current state of each node.

S:C:T means, sockets, cores and threads and CPUS = S*C*T and gvies the number of possible tasks for that node.

The interactive and vgpu nodes are in the Open on Demand/Jupyter space and will eventually not be visible as a resource on the cluster, but we will use them for the Demo.

login-0 ~ $ sinfo
PARTITION AVAIL JOB_SIZE  TIMELIMIT   CPUS  S:C:T   NODES STATE      NODELIST
compute*  up    1-infini 14-00:00:0    256 2:64:2       2 mixed      compute-[0,5]
compute*  up    1-infini 14-00:00:0    256 2:64:2       1 allocated  compute-2
compute*  up    1-infini 14-00:00:0    256 2:64:2       3 idle       compute-[1,3-4]
gpu       up    1-infini 14-00:00:0     96 2:24:2       1 idle       gpu-0
hugemem   up    1-infini 14-00:00:0    256 2:64:2       1 mixed      hugemem-0
hugemem   up    1-infini 14-00:00:0    256 2:64:2       1 idle       hugemem-1
interacti up    1-infini 60-00:00:0      8  8:1:1       1 mixed      interactive-0
interacti up    1-infini 60-00:00:0      8  8:1:1       2 idle       interactive-[1-2]
vgpu      up    1-infini 60-00:00:0     32 32:1:1       4 idle       vgpu-[0-3]

More information about resources and the state of each node can be gained from the command:
sinfo -N --Format=nodelist,cpusload,cpusState,FreeMem,AllocMem,Memory

Or to see what’s running use squeue or again for more detail you can use something like:
squeue --format="%10i %20j %20a %15u %4T %.10M %.10L %R %.C %.m %w"

login-0 ~ $ sinfo -N --Format=nodelist,cpusload,cpusState,FreeMem,AllocMem,Memory
NODELIST            CPU_LOAD            CPUS(A/I/O/T)       FREE_MEM            ALLOCMEM            MEMORY              
compute-0           0.29                18/238/0/256        978099              146072              980163              
compute-1           0.02                0/256/0/256         1016984             0                   980163              
compute-2           57.25               256/0/0/256         869759              262144              980163              
compute-3           0.00                0/256/0/256         1016450             0                   980163              
compute-4           0.00                0/256/0/256         1015908             0                   980163              
compute-5           58.02               96/160/0/256        947861              98304               980163              
gpu-0               0.08                0/96/0/96           505645              0                   489916              
hugemem-0           40.21               40/216/0/256        4022021             307200              3910416             
hugemem-1           0.02                0/256/0/256         4100992             0                   3910416             
interactive-0       0.01                2/6/0/8             10838               8620                15217               
interactive-1       0.00                0/8/0/8             12178               0                   15217               
interactive-2       0.04                0/8/0/8             12116               0                   15217               
vgpu-0              0.00                0/32/0/32           446262              0                   428894              
vgpu-1              0.00                0/32/0/32           446914              0                   428894              
vgpu-2              0.00                0/32/0/32           446902              0                   428894              
vgpu-3              0.03                0/32/0/32           446940              0                   428894              

Limits

Job Length - 2 Weeks
MEM - 2.5GB/cpu