Coordinating Accounts with Slurm
NAU’s Monsoon cluster is host to several research groups. In order to balance the demands of these groups, Slurm is utilized to schedule jobs in a way to maximise fairness. Slurm provides a useful overlay to make starting large compute jobs easy.
Managing Jobs
Listing jobs
$ squeue -A professor # List by account
$ squeue -u abc123 # List by user
Cancelling Jobs
$ scancel 12345678 # Cancel by job ID
$ scancel -u abc123 # Cancel all of a user's jobs
$ scancel -u abc123 --state=running # Cancel all of a user's RUNNING jobs
$ scancel -u abc123 --state=pending # Cancel all of a user's PENDING jobs
$ scancel -A professor # Cancel an entire account's jobs
Holding and Releasing Jobs
$ scontrol hold 12345678 # Hold by job ID
$ scontrol release 12345678 # Release the hold
$ scontrol uhold 12345678 # Hold job 12345678 but allow the job's owner to
# release it
Limiting Users
Check the Current Limits
$ sacctmgr list assoc account=professor
$ sacctmgr list assoc user=abc123 format=account,user,grpcpurunmins
$ sacctmgr list assoc user=abc123
Limiting CPU Time
$ sacctmgr modify user abc123 set GrpCPURunMins=1440 # Limit a user's maximum CPU
# time in pending/running
# jobs to 1440 minutes
# (e.g 24 hours on 1 core,
# 12 hours on 2 cores, etc.)
Limiting Usable CPU’s
$ sacctmgr modify user abc123 set GrpCPUs=2 # The user can only have 2 CPUs
# allocated at a time
Checking the Current Settings and Status
Adding a Student to a SLURM Account
$ sacctmgr add user name=abc123 account=professor # Add user to account
$ sacctmgr modify user where name=abc123 set defaultaccount=professor # Set user's default account
$ sacctmgr modify user where name=abc123 set defaultqos=professor # Set user's Quality of Service (QoS)
$ sacctmgr update user name=abc123 account=professor set fairshare=128 # Set user's fairshare value
Check Account Limits and Fairshare
$ sacctmgr list assoc account=professor
Show historical Fairshare and Usage Information
$ sshare -a -l -A professor
Adjusting Priority
Slurm priority values are calculated by taking the sum of a variety of available factors, each an integer value multiplied by a number in the range 0-1.0. Some available factors include:
- Job size
- Queue time
- Fairshare
Calculating Fairshare
Fairshare is calculated with the following equation, taking values from
$ sshare -laA youraccount
From that data, perform the following calculations:
Norm Shares = Raw Shares / sum(self + siblings' Raw Shares)
Effectv Usage = Raw Usage / account's Raw Usage
FairShare = Norm Shares / Effectv Usage
Modifying User Fairshare
$ sacctmgr modify user abc123 set fairshare=64