Archiving Data to HPSS ====================== The `NERSC HPSS Archive `_ is a large tape library that can store the simulations files for long periods of time. It is recommended to move your data to HPSS frequently, since the scratch filesystems fill up and NERSC will purge data periodically. The script ``nersc.xfer.slurm`` in ``job_scripts/perlmutter/``: :download:`nersc.xfer.slurm <../../job_scripts/perlmutter/nersc.xfer.slurm>` can be used to archive data to HPSS automatically. This is submitted to the xfer queue and runs the script ``process.xrb`` in ``job_scripts/hpss/``: :download:`process.xrb <../../job_scripts/hpss/process.xrb>` which continually looks for output and stores it to HPSS. By default, the destination directory on HPSS will be have the same name as the directory your plotfiles are located in. This can be changed by editing the``$HPSS_DIR`` variable at the top of ``process.xrb``. The following describes how to use the scripts: #. Copy the ``process.xrb`` script and the slurm script ``nersc.xfer.slurm`` into the directory with the plotfiles. #. Submit the archive job: .. prompt:: bash sbatch nersc.xfer.slurm The script ``process.xrb`` is called from the xfer job and will run in the background and continually wait until checkpoint or plotfiles are created. .. note:: ``process.xrb`` always leaves the most recent plotfile and checkpoint file alone, since data may still be written to it. The script will use ``htar`` to archive the plotfiles and checkpoints to HPSS. If the ``htar`` command was successful, then the plotfiles are copied into a ``plotfile/`` subdirectory. This is actually important, since you don’t want to try archiving the data a second time and overwriting the stored copy, especially if a purge took place. The same is done with checkpoint files. Some additional notes: * If the ``ftime`` executable is in your path (``ftime.cpp`` lives in ``amrex/Tools/Plotfile/``), then the script will create a file called ``ftime.out`` that lists the name of the plotfile and the corresponding simulation time. * Right when the job is run, the script will tar up all of the diagnostic files, ``ftime.out``, submission script, and inputs and archive them on HPSS. The ``.tar`` file is given a name that contains the date-string to allow multiple archives to co-exist. * When ``process.xrb`` is running, it creates a lockfile (called ``process.jobid``) that ensures that only one instance of the script is running at any one time. .. warning:: Sometimes if the job is not terminated normally, the ``process.jobid`` file will be left behind. Later jobs should be able to detect this and clean up the stale lockfile, but if this doesn't work, you can delete the file if you know the script is not running. Jobs in the xfer queue start up quickly. The best approach is to start one as you start your main job (or make it dependent on the main job). The sample ``process.xrb`` script will wait for output and then archive it as it is produced.