Nektar++ on ARCHER2
The ARCHER2 national supercomputer is a world class advanced computing resource and is the successor to ARCHER. This guide is intended to provide basic instructions for compiling the Nektar++ stable release or master branch on the ARCHER2 system.
To log into ARCHER2 you should use the address:
ssh [userID]@login.archer2.ac.uk
Compilation Instruction
ARCHER2 uses module based system to load various system modules. For compiling Nektar++ on ARCHER2 we need to choose the GNU compiler suite and load required modules. Note that git is automatically available on the system.
A brief summary of module commands on ARCHER2 (from ARCHER2 documentations):
module list [name]
– List modules currently loaded in your environment, optionally filtered by[name]
module avail [name]
– List modules available, optionally filtered by[name]
module spider [name][/version]
– Search available modules (including hidden modules) and provide information on modulesmodule load name
– Load the module calledname
into your environmentmodule remove name
– Remove the module calledname
from your environmentmodule swap old new
– Swap modulenew
for moduleold
in your environmentmodule help name
– Show help information on the modulename
module show name
– List what modulename
actually, does to your environment
Basic module commands are briefly explained here.
export CRAY_ADD_RPATH=yes
module swap PrgEnv-cray PrgEnv-gnu
module load cray-fftw
module load cmake
These options can be put in the file to avoid typing them for each session. One approach is to put these lines in the .profile file in the home directory. If it was not present there, you can create one. Note the dot in front of the .profile. A better way is to create a bash script, as shown below. let’s name it loadMyModules. To create it type touch loadMyModules in the terminal and press enter. It will create an empty file loadMyModules. Open the file using your preferred text editor and put the following as well as any other module you need to load in the loadMyModules file and save it. Note that the shebang #! character must be on the first line
#!/bin/bash
export CRAY_ADD_RPATH=yes
module swap PrgEnv-cray PrgEnv-gnu
module load cray-fftw
module load cmake
after saving the file, make it executable by running chmod +x loadMyModules
. Now you can load your modules by running the file as: ./loadMyModules
Note that after running, the system may print several warnings and information messages about system environment variables which are being unloaded and newly loaded. You can simply ignore these messages. Just type q
to get to the end of messages.
To clone the repository, first create a public/private ssh key-pair and add it to the gitLab. Instructions on creating ssh key can be found at Generating a new SSH key pair . If the ssh keys have already been set up, this step can be skipped.
The code must be compiled and run from work directory, which is at /work/project_code/project_code/user_name
. For example, for the project code e01
and username mlahooti
, the work directory can be accessed at /work/e01/e01/mlahooti
. You can also echo $HOME
which in this example will prints /home/e01/e01/mlahooti
, and change the /home/
part to /work/
to access your work directory.
Enter the work directory and clone the Nektar++ code into a folder, e.g. nektarpp
cd /work/e01/e01/mlahooti
git clone https://gitlab.nektar.info/nektar/nektar.git nektarpp
After the code is cloned, enter the nektarpp folder, make a build directory and enter it
cd nektarpp
mkdir build
cd build
The above three steps can be done with a single line command toocd nektarpp && mkdir build && cd build
From within the build directory, run the configure command. Note the use of CC and CXX to select the special ARCHER-specific compilers.
CC=cc CXX=CC cmake -DNEKTAR_USE_MPI=ON -DNEKTAR_USE_HDF5=ON -DNEKTAR_USE_FFTW=ON -DTHIRDPARTY_BUILD_BOOST=ON ..
cc
andCC
are the C and C++ wrappers for the Cray utilities and determined by thePrgEnv
module.- HDF5 is a better output option for ARCHER2 since we often run out of the number of files limit on the quota. Using the above command, the code will build the Third-party hdf5 library shipped with the code. If you wish to use the hdf5 module available on the ARCHER, you should load it using
module load cray-hdf5-parallel
- It is possible to use the AVX2 on ARCHER2 which might give you a better speed-up for the Compressible flow server. To activate this, include
-DCMAKE_CXX_FLAGS="-mavx2 -mfma"
in the command line above
At this point you can run ccmake ..
to e.g. disable unnecessary solvers. Now run make as usual to compile the code
make -j 4 install
NOTE: Do not try to run regression tests – the binaries at this point are cross-compiled for the compute nodes and should not execute on the login nodes.
Building using the newer compiler than the default
Using the above instructions, you can build Nektar++ with gcc/11.2.0
, It is possible to build the Nektar++ using a newer or older version of GCC. To find the available versions one can run module -r spider '.*gcc.*'
and for a specific version, for example gcc/12.2.0 module spider gcc/12.2.0
. Follow the specific instructions printed on the terminal to load the version you need. Generally you need to do the followings
module swap PrgEnv-cray PrgEnv-gnu
module load <any other required modules here>
export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH
Do not forget to add the GCC to the path in the job script. Generally, you can find the information about your compiler by running the following command
module show gcc
this will print several lines out to the screen including the following information, for example for the default gcc (gcc/11.2.0):
module show gcc
or module show gcc/11.2.0
whatis("Defines the system paths and environment variables needed for the GNU Compiling Environment.")
prepend_path("MODULEPATH","/opt/cray/pe/lmod/modulefiles/compiler/gnu/8.0")
prepend_path("MODULEPATH","/opt/cray/pe/lmod/modulefiles/mix_compilers")
prepend_path("PATH","/opt/cray/pe/gcc/11.2.0/bin")
prepend_path("MANPATH","/opt/cray/pe/gcc/11.2.0/snos/share/man")
prepend_path("INFOPATH","/opt/cray/pe/gcc/11.2.0/snos/share/info")
prepend_path("LD_LIBRARY_PATH","/opt/cray/pe/gcc/11.2.0/snos/lib64")
setenv("GCC_PATH","/opt/cray/pe/gcc/11.2.0")
setenv("GCC_PREFIX","/opt/cray/pe/gcc/11.2.0")
setenv("GCC_VERSION","11.2.0")
setenv("GNU_VERSION","11.2.0")
setenv("CRAY_LMOD_COMPILER","gnu/8.0")
prepend_path("MODULEPATH","/opt/cray/pe/lmod/modulefiles/comnet/gnu/8.0/ofi/1.0")
prepend_path("LMOD_CUSTOM_PATHS","COMPILER/work/y07/shared/archer2-lmod/utils/compiler/gnu/8.0")
prepend_path("MODULEPATH","/work/y07/shared/archer2-lmod/utils/compiler/gnu/8.0")
From the above information, the following is the information we need to add our version of GCC to the path, hence we need to add the following line to our jub script:
LD_LIBRARY_PATH=/opt/cray/pe/gcc/11.2.0/snos/lib64:$LD_LIBRARY_PATH
Running job on ARCHER2
ARCHER2 uses slurm for job submission which is different from PBS used in Imperial College CX1 and CX2. Nektar++ must be build in the work directory and jobs also must be submitted from work directory.
ARCHER2 supports three different Quality of Service (QoS) which is the type of job that can be run: standard, short, long, highmem, taskfarm, largescale, lowpriority and serial. Except the highmem and serial QoSs, the rest of theses QoSs are on standard partition. lowpriority QoS can be run on both highmem and standard partitions dependenging on the jog. Detailed description of these QoSs, the maximum job allowed in queue, maximum number of nodes and the wall for each can be found in ARCHER2 documentation on running jobs on ARCHER2. Summary of the most common QoS is provided below too:
- Standard : standard QoS allows maximum of 1024 nodes where each node can support 128 task (processes). The maximum wall time for this category is 24 hours. 64 jobs of this type can be queued and 16 jobs are allowed to be running simultaneously. This is the most commonly used QoS
- Short : Short Qos allows maximum of 32 nodes with maximum wall time of 20 minutes. 16 jobs allowed to be queued and maximum of running jobs are 4. Jobs with short QoS can only be submitted during Monday-Friday.
- Long : Long QoS allows maximum of 64 nodes with maximum wall time of 48 hours and minimum wall clock greater than 24 hours. 16 jobs of this type can be queued and 16 job can be running simultanously.
Slurm job script must contains number of nodes, number of task per node, number of cpus per task, wall time, budget ID, partition type, quality of service (QoS), number of OpenMp threads, job environment and execution command. It can also optioanlly have the user supplied job name for easier identification of the job.
The job script can be produced using the bolt module as follows, note that the arguments should be replaced with the program executable and its arguments. For more help you can run bolt -h
in the terminal.
module load bolt
bolt -n [parallel tasks] -N [parallel tasks per node] -d [number of threads per task] -t [wallclock time (h:m:s)] -o [script name] -j [job name] -A [project code] [arguments...]
For an example consider if Nektar++ is installed in /work/e01/e01/mlahooti/nektarpp and the simulation is a 3D homogeneous 1D (2.5D) simulation with HomModesZ=8. We want to do the simulation on 256 processors which is 2 nodes each 128 processes for 14 hours and 20 minutes with Hdf5 output format. Also, we want to assign a name for the job, e.g. firstTest. Also suppose that we are using the budget with project project_id
Here is an example of slurm script for a standard job
#!/bin/bash
# Slurm job options (job-name, compute nodes, job time)
#SBATCH --job-name=firstTest
#SBATCH --time=14:20:0
#SBATCH --nodes=2
#SBATCH --tasks-per-node=128
#SBATCH --cpus-per-task=1
# Replace [budget code] below with your budget code (e.g. t01)
#SBATCH --account=project_id
#SBATCH --partition=standard
#SBATCH --qos=standard
#SBATCH --distribution=block:block
#SBATCH --hint=nomultithread
# Setup the job environment (this module needs to be loaded before any other modules)
module load epcc-job-env
# Set the number of threads to 1
# This prevents any threaded system libraries from automatically
# using threading.
export OMP_NUM_THREADS=1
export NEK_DIR=/work/e01/e01/mlahooti/nektar-master/build
export NEK_BUILD=$NEK_DIR/dist/bin
export LD_LIBRARY_PATH=/opt/cray/pe/gcc/11.2.0/snos/lib64:$LD_LIBRARY_PATH
:$NEK_DIR/ThirdParty/dist/lib:$NEK_DIR/dist/lib64:$LD_LIBRARY_PATH
# Launch the parallel job
srun $NEK_BUILD/IncNavierStokesSolver naca0012.xml session.xml --npz 4 -i Hdf5 &> runlog
Further, for more convenient the script contains two export
commands which defines NEK_DIR and NEK_BUILD environment variables, the former is the path to Nektar++ build directory and the latter to the solver executable location. Additionally, the third export is to add the libraries location to the system path, where each library path is separated from others by colon :
To submit the job, assuming the above script is saved in a file named myjob.slurm run the following commandsbatch myjob.slurm
The job status can be monitored using squeue -u $USER
running this command prints the following information on the screen, where ST
is the status of the job, here PD
means the job is waiting for resource allocation, other common status are R
, F
, CG
, CD
and CA
where means running, failed, in the process of completing, completed and cancelled
respectively.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
121062 standard myJob-1 mlahooti PD 0:00 4 (Priority)
121064 standard myJob-2 mlahooti PD 0:00 4 (Priority)
Cancelling a job can be using scancell job-ID
command, where the job-ID, is the id of the job. for example, the job id for the first job above is 121062.
Further, detailed information about a particular job, including the estimation for start time can be obtained via scontrol show job -dd job-ID
NOTE: It is highly recommended that the job script checked to be error free before submiting to the system. Using checkScript command checks for integrity of the job scrip, shows the errors and estimate the budget it will consume. Run the following command in the directory you want to submit the job for checking the script
checkScript myjob.slurm