This guide provides instructions on accessing GPU resources, submitting jobs via Slurm, managing your workflows
To use the GPU servers, you must have a cs account and approval for HPC use.
Access to the GPU cluster can be done in two ways:
Use SSH to connect directly to the GPU Login node:
ssh <username>@gpu.cs.nmt.edu
Both ways to access the HPC mount your same home directory as the Login server.
All GPU jobs should be submitted via Slurm. Below are examples and best practices.
Job Script Example:
#!/bin/bash
#SBATCH --job-name=my_gpu_job
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=02:00:00
#SBATCH --partition=gpu
#SBATCH --output=output_%j.log
source ~/venv/
python my_script.py
Submitting a Job:
sbatch my_job_script.sh
Monitoring Jobs:
squeue -u <username>
scancel <job_id>
Interactive GPU Sessions:
If you need an interactive session for debugging or development:
srun --partition=gpu --gres=gpu:1 --time=01:00:00 --pty zsh
Availabe Nodes
table:
| name | resources | memory | partition |
|---|---|---|---|
| GPU-A | 2X A-100 | a number | partition-a |
| GPU-L | 4X L-40S | a number | partition-l |
https://curc.readthedocs.io/en/latest/running-jobs/slurm-commands.html