Slurm script on HPC

Slurm script on HPC#

If you have access to some HPC (High Performance Computing) systems (a.k.a clusters, supercomputers), very likely you will need to submit a job script via some scheduler software like Slurm

PtyRAD can be easily launched on those machines using the previously mentioned CLI commands.

Reconstruction mode on 1 GPU#

Example Slurm script:

#!/bin/bash
#SBATCH --job-name=ptyrad
#SBATCH --mail-user=cl2696@cornell.edu       # Where to send mail
#SBATCH --nodes=1                            # number of nodes requested
#SBATCH --ntasks=1                           # number of tasks to run in parallel
#SBATCH --cpus-per-task=4                    # number of CPUs required for each task. 4 for 10GB, 8 for 20GB, 32 for 80GB of A100.
#SBATCH --gres=gpu:2g.20gb:1               # request a GPU #gpu:a100:1, gpu:2g.20gb:1
#SBATCH --time=168:00:00                      # Time limit hrs:min:sec
#SBATCH --output=log_job_%j_ptyrad_PSO.txt  # Standard output and error log

## Assuming you are under root `ptyrad/` working directory and calling `sbatch scripts/slurm_run_ptyrad.sub`
pwd; hostname; date

module load cuda/11.8

source activate ptyrad

## Set the params_path variable
## Make sure the path specified inside params.yml is reachable from your root (i.e., `ptyrad/`)
PARAMS_PATH="params/examples/PSO.yml"
echo params_path = ${PARAMS_PATH}

## The gpuid is used to assign the device for PtyRAD, it can be either 'acc', 'cpu', or an integer
## The jobid is used as a unique identifier for hypertune mode with multiple GPU workers on different nodes. 
## The JOBID is an environment variable that'll be automatically set to 1-N via LoopSubmit.sh. If not set, default to 0.

## Execute the ptyrad CLI command `ptyrad run`
ptyrad run "${PARAMS_PATH}" --gpuid 0 --jobid "${JOBID:-0}" 2>&1 # This runs via ptyrad CLI command on 1 GPU. 

date

(The reconstuction mode is solely configured by the params file by setting if_hypertune: false.)

💡 This is the same example as ptyrad/scripts/slurm_run_ptyrad.sub.