Working at IACS
Ookami
Ookami seems to have 48 compute cores grouped into 4 pools of 12 threads (there is actually a 13th core on each for OS stuff). So an ideal config would be running 4 MPI each with 12 threads.
Log-in to login.ookami.stonybrook.edu
AMReX setup
We need to tell AMReX about the machine. Put the following Make.local
file
in amrex/Tools/GNUmake
:
https://raw.githubusercontent.com/AMReX-Astro/workflow/main/job_scripts/iacs/Make.local
Cray compilers
You can only access the Cray environment on a compute note:
srun -p short -N 1 -n 48 --pty bash
Note
The interactive slurm job times out after 1 hour. You can run for
infinite time on the fj-debug1
and fj-debug2
nodes (you can
ssh to them).
There are 2 sets of Cray compilers, cce
and cce-sve
. The
former are the newer LLVM-based ocompilers, but the Fortran compiler
does not seem to support the ARM architecture. The latter are the
older compilers. Even though both have version numbers of the form
10.0.X
, they have different options.
(see https://www.stonybrook.edu/commcms/ookami/faq/getting-started-guide.php)
Setup the environment
module load CPE
#module load cray-mvapich2_nogpu/2.3.4
This should load the older cce-sve
compilers (10.0.1
).
The latest AMReX has an if test in the cray.mak
file that recognizes
the older Cray compiler on this ARM architecture and switches to using
the old set of compiler flags, so it should work.
You can then build via:
make COMP=cray -j 24 USE_MPI=FALSE
Note
Compiling takes a long time. At the moment, we do not link with
MPI, with a cannot find nopattern
error (which is why that
module is commented out above).
GCC
GCC 10.2
This needs to be done on the compute notes.
Load modules as:
module load slurm
module load /lustre/projects/global/software/a64fx/modulefiles/gcc/10.2.1-git
module load /lustre/projects/global/software/a64fx/modulefiles/mvapich2/2.3.4
Build as
make -j 24 USE_MPI=TRUE USE_OMP=TRUE
Note, this version of GCC knows about the A64FX chip, and that Make.local
adds
the architecture-specific compilations flags.
To run on an interactive node, on 1 MPI * 12 OpenMP, do:
export MV2_ENABLE_AFFINITY=0
export OMP_NUM_THREADS=12
mpiexec -n 1 ./Castro3d.gnu.MPI.OMP.ex inputs.3d.sph amr.max_level=2 max_step=5