GROMACS

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

This is a GPU enabled build, containing both MPI and threadMPI binaries.

GROMACS Homepage

Available Modules¶

module load GROMACS/2020.5-intel-2020a-cuda-11.0.2-hybrid

GROMACS (the GROningen MAchine for Chemical Simulations) is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

GROMACS is available to anyone at no cost under the terms of the GNU Lesser General Public Licence. Gromacs is a joint effort, with contributions from developers around the world: users agree to acknowledge use of GROMACS in any reports or publications of results obtained with the Software.

Job submission¶

GROMACS performance depends on several factors, such as usage (or lack thereof) of GPUs, the number of MPI tasks and OpenMP threads, the load balancing algorithm, the ratio between the number of Particle-Particle (PP) ranks and Particle-Mesh-Ewald (PME) ranks, the type of simulation being performed, force field used and of course the simulated system. For a complete set of GROMACS options, please refer to GROMACS documentation.

The following job script is just an example and asks for five MPI tasks, each of which consists of three OpenMP threads, for a total of 15 threads. Please try other mdrun flags in order to see if they make your simulation run faster. Examples of such flags are -npme, -dlb, -ntomp. If you use more MPI tasks per node you will have less memory per MPI task. If you use multiple MPI tasks per node, you need to set CRAY_CUDA_MPS=1 to enable the tasks to access the GPU device on each node at the same time.

#!/bin/bash -e

#SBATCH --job-name      GROMACS_test # Name to appear in squeue
#SBATCH --time          00:10:00     # Max walltime
#SBATCH --mem-per-cpu   512MB        # Max memory per logical core
#SBATCH --ntasks        5            # 5 MPI tasks
#SBATCH --cpus-per-task 3            # 3 OpenMP threads per task

module load GROMACS/5.1.4-intel-2017a

# Prepare the binary input from precursor files 
srun -n 1 gmx grompp -v -f minim.mdp -c protein.gro -p protein.top -o protein-EM-vacuum.tpr

# Run the simulation
# Note that the -deffnm option is an alternative to specifying several input files individually
# Note also that the -ntomp option should be used when using hybrid parallelisation
srun gmx_mpi mdrun -ntomp ${SLURM_CPUS_PER_TASK} -v -deffnm protein-EM-vacuum -c input/protein.gr -cpt 30

Note: To prevent performance issues we moved the serial "gmx" to "gmx_serial". The present "gmx" prints a note and calls "gmx_mpi mdrun" (if called as "gmx mdrun") and "gmx_serial" in all other cases.

Note: The hybrid version with CUDA can also run on pure CPU architectures. Thus you can use gmx_mpi from the GROMACS/???-cuda-???-hybrid module on Mahuika compute nodes as well as Mahuika GPU nodes.

Checkpointing and restarting¶

In the examples given above, the -cpt 30 option instructs Gromacs to write a full checkpoint file every 30 minutes. You can restart from a checkpoint file using the -cpi flag, thus: -cpi state.cpt.

Warnings regarding CPU affinity¶

If you run GROMACS on a node that is simultaneously running other jobs (even other GROMACS jobs), you may see warnings like this in your output: WARNING: In MPI process #0: Affinity setting failed. This can cause performance degradation! If you think your setting are correct, contact the GROMACS developers.

One way to prevent these warnings, which is also useful for reducing the risk of inefficient CPU usage, is to request entire nodes. On the Mahuika cluster, this can be done using the following lines in your input, altered as appropriate:

MPI + SMPMPIOpenMP + MPI + SMPOpenMP + MPI

Using MPI parallelisation and hyperthreading, but no OpenMP parallelisation.

#SBATCH --nodes           4    # May vary
#SBATCH --ntasks-per-node 72   # Must be 72
                            # (the number of logical cores per node)
#SBATCH --cpus-per-task   1    # Must be 1

Using MPI parallelisation with neither hyperthreading nor OpenMP parallelisation.

#SBATCH --nodes           4    # May vary
#SBATCH --ntasks-per-node 36   # Must be 36
                            # (the number of physical cores per node)
#SBATCH --cpus-per-task   1    # Must be 1
#SBATCH --hint=nomultithread   # Don't use hyperthreading

Using hybrid parallelisation and hyperthreading:

#SBATCH --nodes           4    # May vary
#SBATCH --ntasks-per-node 1    # Must be 1
#SBATCH --cpus-per-task   72   # Must be 72
                            # (the number of logical cores per node)

Using hybrid parallelisation but not hyperthreading:

#SBATCH --nodes           4    # May vary
#SBATCH --ntasks-per-node 1    # Must be 1
#SBATCH --cpus-per-task   36   # Must be 36
                            # (the number of physical cores per node)
#SBATCH --hint=nomultithread   # Don't use hyperthreading

If you opt to use hybrid parallelisation, it is also important to run mdrun_mpi with the -ntomp <number> option, where <number> should be the number of CPUs per task. You can make sure the value is correct by using -ntomp ${SLURM_CPUS_PER_TASK}. Hybrid parallelisation can be more efficient than MPI-only parallelisation, as within the same node there is no need for inter-task communication.

NOTE on using GROMACS on Māui:

On the Māui cluster, normally there is no reason to specifically request a whole node, as all jobs are scheduled to run on one or more entire nodes. However, we have seen issues with slow performance and will recommend using the `--exclusive` flag when running GROMACS. It may also be advisable to request tasks or CPUs in multiples of 80, since that is the number of vCPUs per node.

NVIDIA GPU Container¶

NVIDIA has a GPU accelerated version of GROMACS in its NGC container registry (more details about NGC here). We have pulled a version of their container and stored it at this location (you can also pull your own version if you wish): /opt/nesi/containers/nvidia/gromacs-2020_2.sif. We have also provided an example submission script that calls the Singularity image here: /opt/nesi/containers/nvidia/gromacs-example.sl.

Further Documentation¶

GROMACS Homepage

GROMACS Manual