Contact Advanced Research Computing
Quick Links
- Request an Account
- Submitting your first job
- Request storage
- Slurm Documentation
- Slurm Cheat Sheet
- Connecting to Monsoon
- File Management
- Linux/Bash shell
- Using the Cluster: Advanced
- FAQs
Using the Monsoon Cluster: Introduction
We use a piece of software called Slurm for resource management and scheduling. Job priorities are determined by a number of factors, fairshare (most predominant) as well as age, partition, and size of the job.
A Slurm cheat sheet is available if you have used Slurm before.
Viewing the Cluster Status
While logged in to one of Monsoon’s login nodes (wind or rain), you can inspect the state of the queues with the squeue command:
[abc123@wind ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
12345678 core demo_job def456 PD 0:00 1 (Resources)
18765433 core demo_job ghi789 PD 0:00 1 (Priority)
12398234 core demo_job jkl987 R 12-20:14:42 1 cn4
By default squeue lists both the running R and the pending queue PD. The jobs with an R in the ST column are in the running state. The jobs with a PD in the ST column are in the pending state.
The time column lists how long the job has been running. You can see that there is one job that has been running for almost 13 days.
It might appear that the cluster’s resources are mostly all allocated since there are jobs in the pending state, but this is not necessarily the case. It could be that the jobs in the PD state are asking for more resources than are available on the cluster. To find out more info about the cluster state, use the sinfo command.
[abc123@wind ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
core up 14-00:00:0 7 mix cn[4,7-8,11-14]
core up 14-00:00:0 7 all cn[1-3,5-6,9-10,15]
This shows the partition of nodes defined in slurm, of which there is only one: core. Note that we can see that there are free cores (cpus) available as there are nodes in the mix state. Nodes in the mix state only have some of their cores currently allocated, whereas nodes that have all cores allocated will be in the alloc state.