.. highlight:: bash *************** Working at IACS *************** Ookami ====== Ookami seems to have 48 compute cores grouped into 4 pools of 12 threads (there is actually a 13th core on each for OS stuff). So an ideal config would be running 4 MPI each with 12 threads. Log-in to ``login.ookami.stonybrook.edu`` AMReX setup ----------- We need to tell AMReX about the machine. Put the following ``Make.local`` file in ``amrex/Tools/GNUmake``: https://raw.githubusercontent.com/AMReX-Astro/workflow/main/job_scripts/iacs/Make.local Cray compilers -------------- You can only access the Cray environment on a compute note: :: srun -p short -N 1 -n 48 --pty bash .. note:: The interactive slurm job times out after 1 hour. You can run for infinite time on the ``fj-debug1`` and ``fj-debug2`` nodes (you can ssh to them). There are 2 sets of Cray compilers, ``cce`` and ``cce-sve``. The former are the newer LLVM-based ocompilers, but the Fortran compiler does not seem to support the ARM architecture. The latter are the older compilers. Even though both have version numbers of the form ``10.0.X``, they have different options. (see https://www.stonybrook.edu/commcms/ookami/faq/getting-started-guide.php) Setup the environment :: module load CPE #module load cray-mvapich2_nogpu/2.3.4 This should load the older ``cce-sve`` compilers (``10.0.1``). The latest AMReX has an if test in the ``cray.mak`` file that recognizes the older Cray compiler on this ARM architecture and switches to using the old set of compiler flags, so it should work. You can then build via: :: make COMP=cray -j 24 USE_MPI=FALSE .. note:: Compiling takes a long time. At the moment, we do not link with MPI, with a ``cannot find nopattern`` error (which is why that module is commented out above). GCC --- GCC 10.2 ^^^^^^^^ This needs to be done on the compute notes. Load modules as: :: module load slurm module load /lustre/projects/global/software/a64fx/modulefiles/gcc/10.2.1-git module load /lustre/projects/global/software/a64fx/modulefiles/mvapich2/2.3.4 Build as :: make -j 24 USE_MPI=TRUE USE_OMP=TRUE Note, this version of GCC knows about the A64FX chip, and that ``Make.local`` adds the architecture-specific compilations flags. To run on an interactive node, on 1 MPI * 12 OpenMP, do:: export MV2_ENABLE_AFFINITY=0 export OMP_NUM_THREADS=12 mpiexec -n 1 ./Castro3d.gnu.MPI.OMP.ex inputs.3d.sph amr.max_level=2 max_step=5