NVIDIA GPU Containers
NVIDIA provides access to GPU accelerated software through their NGC container registry.
NGC offers a comprehensive catalog of GPU-accelerated software for deep learning, machine learning, and HPC. NGC containers deliver powerful and easy-to-deploy software proven to deliver the fastest results. By taking care of the plumbing, NGC enables users to focus on building lean models, producing optimal solutions and gathering faster insights.
Many of these containers are able to run under Apptainer, which is supported on the NeSI platform. NVIDIA also specifies the GPU requirements for each container, i.e. whether it will run on our Pascal (sm60) GPUs.
There are instructions for converting their Docker images to Apptainer images on the NVIDIA site but some small changes are required to these instructions on NeSI. As an example, here we show the steps required for running the NAMD image on NeSI, based on the NVIDIA instructions here.
-
Download the APOA1 benchmark data:
wget -O - https://gitlab.com/NVHPC/ngc-examples/raw/master/namd/3.0/get_apoa1.sh | bash cd apoa1
-
Load the Apptainer module:
module load Apptainer
-
Build the Apptainer image. This step differs from the NVIDIA instructions because instead of using "build" we "pull" the image directly, which does not require root access:
Note
Please do refer Build Environment Variables prior to running the following
pull
command.apptainer pull namd-3.0-alpha9-singlenode.sif docker://nvcr.io/hpc/namd:3.0-alpha9-singlenode
-
Copy the following into a Slurm script named run.sl:
#!/bin/bash -e #SBATCH --job-name=namdgpu #SBATCH --time=00:10:00 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 #SBATCH --gpus-per-node P100:1 #SBATCH --mem=1G module purge module load Apptainer # name of the NAMD input file, tag, etc NAMD_INPUT="apoa1_nve_cuda.namd" NAMD_SIF="namd-3.0-alpha9-singlenode.sif" NAMD_EXE=namd3 # apptainer command with required arguments APPTAINER="apptainer exec --nv -B $(pwd):/host_pwd --pwd /host_pwd ${NAMD_SIF}" # run NAMD ${APPTAINER} ${NAMD_EXE} +ppn ${SLURM_CPUS_PER_TASK} +idlepoll ${NAMD_INPUT}
-
Submit the job:
sbatch run.sl
-
View the standard output from the simulation in the Slurm .out file.
We expect similar steps to work for other NGC containers.