Skip to content
Contact Support

R

Description

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment, itself developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and so forth) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

Licence

R is made available at no cost under the terms of version 2 of the GNU General Public Licence.

NeSI Customisations

  • We patch the snow package so that there is no need to use RMPISNOW when using it over MPI.
  • Our most recent R environment modules set R_LIBS_USER to a path which includes the compiler toolchain, so for example ~/R/gimkl-2022a/4.2 rather than the usual default of ~/R/x86_64-pc-linux-gnu-library/4.2.

Available Modules

R Base

module load R/4.2.1-gimkl-2022a

We also have some environment modules which extend the base R ones with extra packages:

R-Geo

Includes rgeos, rgdal and other geometric and geospatial packages based on the libraries GEOS, GDAL, PROJ and UDUNITS.

module load R-Geo/4.2.1-gimkl-2022a

R-bundle-Bioconductor

Includes many of the BioConductor suite of packages.

module load R-bundle-Bioconductor/3.15-gimkl-2022a-R-4.2.1

Example R scripts

png(filename="plot.png")  # This line redirects plots from screen to plot.png file.

# Define the cars vector with 5 values
cars <- c(1, 3, 6, 4, 9)

# Graph the cars vector with all defaults
plot(cars)
jobid <- as.numeric(Sys.getenv("SLURM_ARRAY_TASK_ID"))
jobid

The following example sums 50 normally distributed random value vectors of sizes 1 million to 1000050. Set the number of workers in your submission script with --cpus-per-task=... Note that all workers run on the same node. Hence, the number of workers is limited to the number of cores (physical if --hint=nomultithread or logical if using --hint=multithread).

library(doParallel)
registerDoParallel(strtoi(Sys.getenv("SLURM_CPUS_PER_TASK")))

# 50 calculations, store the result in 'x'
x <- foreach(z = 1000000:1000050, .combine = 'c') %dopar% {
  sum(rnorm(z))
}

print(x)

This example is similar to the above except that workers can run across multiple nodes. Note that we don't need to specify the number of workers when starting the cluster -- it will be derived by the mpiexec command, which slurm will invoke. You will need to load the gimkl module to expose the MPI library.

library(doMPI, quiet=TRUE)
cl <- startMPIcluster()
registerDoMPI(cl)

# 50 calculations, store the result in 'x'
x <- foreach(z = 1000000:1000050, .combine = 'c') %dopar% {
  sum(rnorm(z))
}

closeCluster(cl)
print(x)
mpi.quit()
library(snow)
# If there are multiple tasks only one reaches here, others become slaves.

# Select MPI-based or fork-based parallelism depending on ntasks
if(strtoi(Sys.getenv("SLURM_NTASKS")) > 1) {
    cl <- makeMPIcluster()
} else {
    cl <- makeSOCKcluster(max(strtoi(Sys.getenv('SLURM_CPUS_PER_TASK')), 1))
}

# 50 calculations to be done:
x <- clusterApply(cl, 1000000:1000050, function(z) sum(rnorm(z)))

stopCluster(cl)

Example Slurm Scripts

#!/bin/bash -e

#SBATCH --job-name    MySerialRJob
#SBATCH --time        01:00:00
#SBATCH --mem         512MB
#SBATCH --output      MySerialRJob.%j.out # Include the job ID in the names of
#SBATCH --error       MySerialRJob.%j.err # the output and error files

module load 4.2.1-gimkl-2022a

# Help R to flush errors and show overall job progress by printing
# "executing" and "finished" statements.
echo "Executing R ..."
srun Rscript MySerialRJob.R
echo "R finished."
#!/bin/bash -e

#SBATCH --job-name    MyArrayRJob
#SBATCH --time        01:00:00
#SBATCH --array       1-10
#SBATCH --mem         512MB
#SBATCH --output      MyArrayRJob.%j.out # Include the job ID in the names of
#SBATCH --error       MyArrayRJob.%j.err # the output and error files

module load R/4.2.1-gimkl-2022a

# Help R to flush errors and show overall job progress by printing
# "executing" and "finished" statements.
echo "Executing R ..."
srun Rscript MyArrayRJob.R
echo "R finished."
#!/bin/bash -e

#SBATCH --job-name      MyMPIRJob
#SBATCH --time          01:00:00
#SBATCH --ntasks        12
#SBATCH --cpus-per-task 1
#SBATCH --mem-per-cpu   512MB
#SBATCH --output        MyMPIRJob.%j.out # Include the job ID in the names of
#SBATCH --error         MyMPIRJob.%j.err # the output and error files

module load R/4.2.1-gimkl-2022a
# need MPI
module load gimkl/2022a

# Help R to flush errors and show overall job progress by printing
# "executing" and "finished" statements.
echo "Executing R ..."
# Our R has a patched copy of the snow library so that there is no need to use
# RMPISNOW.
srun Rscript doMPI
echo "R finished."

Generating images and plots

Normally when plotting or generating other sorts of images, R expects a graphical user interface to be available so it can render and display the image on the fly. However, it is possible to instruct R to export the image directly to a file instead of displaying it on the screen, using code like the following:

png(filename="plot.png")

This statement instructs R to export all future graphical output to a PNG file named plot.png, until a different device driver is selected.

For more information about graphical device drivers, please see the R documentation.

Dealing with packages

Much R functionality is not supplied with the base installation, but is instead added by means of packages written by the R developers or by third parties.  We include a large number of such R packages in our R environment modules.

Getting a list of installed packages

It is best to view the list of available R packages interactively. To do so, call up the package library:

module R/4.2.1-gimkl-2022a
R
library()

or just use the module command:

module show R/4.2.1-gimkl-2022a

Please note that different installations of R, even on the same NeSI cluster, may contain different collections of packages. Furthermore, if you have your own packages in a directory that R can automatically detect, these will also be shown in a separate section.

Getting a list of available libraries

You can print a list of the library directories in which R will look for packages by running the following command in an R session:

.libPaths()

For R/4.2.1 the command .libPaths() will return the following:

.libPaths()
[1] "/home/YOUR_USER_NAME/R/gimkl-2022a/4.2"                            
[2] "/opt/nesi/CS400_centos7_bdw/R/4.2.1-gimkl-2022a/lib64/R/library"

When using the library() function R will first look to your Home/Personal library for the package and then to the Systems Library provided by NeSI. This can be used in conjuction with installed.packages() to see what is available in a specific library. eg:

installed.packages("/home/YOUR_USER_NAME/R/gimkl-2022a/4.2")
...
ggplot2 NA NA NA "no" "4.2.1"
ggrepel NA NA NA "yes" "4.2.1"
etc...

Specifying custom library directories

You can add your own custom library directories by putting a list of extra directories in the .Renviron file in your home directory. This list should look like the following:

export R_LIBS=/home/jblo123/R/foo:/home/jblo123/R/bar

Note that, of the contents of the R_LIBS variable, only those directories that actually exist will show up in the output of .libPaths().

Alternatively, you can specify in your R script:

dir.create("/nesi/project/<projectID>/Rpackages", showWarnings = FALSE, recursive = TRUE)
.libPaths(new="/nesi/project/<projectID>/Rpackages")

Downloading and installing a new package

To install a package into R, use the install.packages command.

For example, to install the sampling package:

module load R/4.2.1-gimkl-2022a
R
install.packages("sampling")

You will most likely be asked if you want to use a personal library and, if you have not previously done so, whether you wish to create a new personal library. Answer "y" to both questions.

Enter the number for one of the Australian sites from the list of download mirrors that will appear, as the lone New Zealand mirror site is more often out of date.

R will then download, compile and install the new package for you.

You can confirm the package has been installed by using the library() command:

library("foo")

If the package has been correctly installed, you will get no response. On the other hand, if the package is missing or was not installed correctly, an error message will typically be returned:

library("foo")
Error in library("foo") : there is no package called ‘foo’

Compiling a C library for use with R

You can compile custom C libraries for use with R using the R shared library compiler:

module load R/4.2.1-gimkl-2022a
R CMD SHLIB mylib.c

This will create the shared object mylib.so. You can then reference the library in your R script:

R
dyn.load("~/R/lib64/mylib.so")

Quitting an interactive R session

At the R command prompt, when you want to quit R, type the following:

quit()

You will be asked "Save workspace image? [y/n/c]". Type n.

Troubleshooting

Missing devtools

Package installation will occasionally fail due to missing system libraries (eg HarfBuzz, FriBidi or devtools), this is resolved by loading the devtools module prior to the version of R you require.

module load devtools
module load R/4.2.1-gimkl-2022a

Can't install sf, rgdal etc

Use the R-Geo module

module load R-Geo/4.2.1-gimkl-2022a

Cluster/Parallel environment variable not accessed

Depending on the working environment, registering of a cluster and accessing the SLURM_CPUS_PER_TASK environment variable may not return an integer, in particular the function strtoi (string to integer) doesn't work correctly. Instead use as.numeric

Options:

  • strtoi(Sys.getenv("SLURM_CPUS_PER_TASK"))
  • as.numeric(Sys.getenv("SLURM_CPUS_PER_TASK"))

INLA Package

Running functions from the INLA package may results in GLib versions not found. Installing a specific version and binary for the package as below has worked.

module load R-bundle-Bioconductor/3.17-gimkl-2022a-R-4.3.1 
R 

#https://inla.r-inla-download.org/R/testing/bin/windows/contrib/4.3/ 
remotes::install_version("INLA", version="23.06.29",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/testing"), dep=TRUE) 

INLA::inla.binary.install() #5 options, choose centos