Access is restricted to researchers with prior approval. When requesting access, please email cshelp@nmt.edu with:
You will need a department account to log in.
It may be that your needs can be fufilled by alternative services. please explore these alternatives before submitting a request:
After obtaining approval, you may begin using the GPU server. User storage is handled using the same file server connected to the department login server and compuer lab.
We are currently experiencing technical difficulties resulting in an outage for the web interface. You can still access resources via SSH
An Open OnDemand instance is available at gpu.cs.nmt.edu. Log in using your department credentials.
Available services include:
You may also connect directly to the login node through a SSH session:
ssh <username>@gpu.cs.nmt.edu
Resource management is done through slurm.
There are 3 core commands when requesting compute resources
Sbatch submits a shell script for non-interactive async execution.
Each sbatch script is made up of a preamble:
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --nodes=1
#SBATCH --partition=gpu-a
#SBATCH --output=output.log
And the body:
source .venv
python3 <whatever>
The preamble instructs the scheduler how to queue the job, resources to allocate, etc.
The body contains the actions you want the scheduler to run.
Use sbatch for:
Srun can be used for interactive programs along with parallel tasks. srun is commonly used for
While the previous two commands run jobs, salloc reserves compute resources and drops you into an interactive shell session. when running srun inside a session, the job will use the resources within the allocation. using sbatch to submit a job will create it's own request even if you are inside a salloc session
Quality of Services affects scheduling priority, preemption, and resource limits for jobs. Currently you will have to explicitly supply a QoS with the --qos parameter.
| QoS | purpose |
|---|---|
| test | describe |
| train | describe |
A partition is a logical grouping of compute nodes. It organizes physical resources with similar hardware into a distinct pool. Currently, two partitions are defined, gpu-a and gpu-l
| partition | resources (GPUS) | resources (Memory) | use |
|---|---|---|---|
| gpu-a | 2X A-100s | why would you use this partition? | |
| gpu-l | 4X L-40s | why would you use this partition? |