.. highlight:: bash Managing Jobs at NERSC ====================== Perlmutter ---------- GPU jobs ^^^^^^^^ Perlmutter has 1536 GPU nodes, each with 4 NVIDIA A100 GPUs---therefore it is best to use 4 MPI tasks per node. .. important:: you need to load the same modules used to compile the executable in your submission script, otherwise, it will fail at runtime because it can't find the CUDA libraries. Below is an example that runs on 16 nodes with 4 GPUs per node. It also does the following: * Includes logic for automatically restarting from the last checkpoint file (useful for job-chaining). This is done via the ``find_chk_file`` function. * Installs a signal handler to create a ``dump_and_stop`` file shortly before the queue window ends. This ensures that we get a checkpoint at the very end of the queue window. * Can post to slack using the :download:`slack_job_start.py <../../job_scripts/perlmutter/slack_job_start.py>` script---this requires a webhook to be installed (in a file ``~/.slack.webhook``). .. literalinclude:: ../../job_scripts/perlmutter/perlmutter.submit :language: sh .. note:: With large reaction networks, you may get GPU out-of-memory errors during the first burner call. If this happens, you can add :: amrex.the_arena_init_size=0 after ``${restartString}`` in the srun call so AMReX doesn't reserve 3/4 of the GPU memory for the device arena. .. note:: If the job times out before writing out a checkpoint (leaving a ``dump_and_stop`` file behind), you can give it more time between the warning signal and the end of the allocation by adjusting the ``#SBATCH --signal=B:URG@<n>`` line at the top of the script. Also, by default, AMReX will output a plotfile at the same time as a checkpoint file, which means you'll get one from the ``dump_and_stop``, which may not be at the same time intervals as your ``amr.plot_per``. To suppress this, set: :: amr.write_plotfile_with_checkpoint = 0 CPU jobs ^^^^^^^^ Below is an example that runs on CPU-only nodes. Here ``ntasks-per-node`` refers to number of MPI processes (used for distributed parallelism) per node, and ``cpus-per-task`` refers to number of hyper threads used per task (used for shared-memory parallelism). Since Perlmutter CPU node has 2 sockets * 64 cores/socket * 2 threads/core = 256 threads, set ``cpus-per-task`` to ``256/(ntasks-per-node)``. However, it is actually best to assign each OpenMP thread per physical core, so it is best to set ``OMP_NUM_THREADS`` to ``cpus-per-task/2``. See more detailed instructions within the script. .. literalinclude:: ../../job_scripts/perlmutter-cpu/perlmutter_cpu.slurm :language: sh Submitting and checking status ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: Jobs should be run in your ``$SCRATCH`` directory and not your home directory. By default, SLURM will change directory into the submission directory. Jobs are submitted as: .. prompt:: bash sbatch script.slurm You can check the status of your jobs via: .. prompt:: bash squeue --me and an estimate of the start time can be found via: .. prompt:: bash squeue --me --start to cancel a job, you would use ``scancel``. Filesystems ^^^^^^^^^^^ In addition to ``$SCRATCH``, there is a project-wide directory on the common filesystem, CFS. This allows us to share files with everyone in the project. For instance, for project ``m3018``, we would do: .. prompt:: bash cd $CFS/m3018 There is a 20 TB quota here, which can be checked via: .. prompt:: bash showquota m3018 Chaining ^^^^^^^^ To chain jobs, such that one queues up after the previous job finished, use the `chainslurm.sh <https://github.com/AMReX-Astro/workflow/blob/main/job_scripts/slurm/chainslurm.sh>`_ script in that same directory: .. prompt:: bash chainslurm.sh jobid number script where ``jobid`` is the existing job you want to start you chain from, ``number`` is the number of new jobs to chain from this starting job, and ``script`` is the job submission script to use (the same one you used originally most likely). .. note:: The script can also create the initial job to start the chain. If ``jobid`` is set to ``-1``, then the script will first submit a job with no dependencies and then chain the remaining ``number``-1 jobs to depend on the previous. You can view the job dependency using: .. prompt:: bash squeue -l -j job-id where ``job-id`` is the number of the job.