Coordinating Accounts with Slurm
NAU’s Monsoon cluster is host to several research groups. In order to balance the demands of these groups, Slurm is utilized to schedule jobs in a way to maximise fairness. Slurm provides a useful overlay to make starting large compute jobs easy.
Managing Jobs
Listing jobs
squeue -A professor # List by account
squeue -u abc123 # List by user
Cancelling Jobs
scancel 12345678 # Cancel by job ID
scancel -u abc123 # Cancel all of a user's jobs
scancel -u abc123 --state=running # Cancel all of a user's RUNNING jobs
scancel -u abc123 --state=pending # Cancel all of a user's PENDING jobs
scancel -A professor # Cancel an entire account's jobs
Holding and Releasing Jobs
scontrol hold 12345678 # Hold by job ID
scontrol release 12345678 # Release the hold
scontrol uhold 12345678 # Hold job 12345678 but allow the job's owner to
# release it
Limiting Users
Check the Current Limits
sacctmgr list assoc account=professor
sacctmgr list assoc user=abc123 format=account,user,grpcpurunmins
sacctmgr list assoc user=abc123
Limiting CPU Time
sacctmgr modify user abc123 set GrpCPURunMins=1440 # Limit a user's maximum CPU
# time in pending/running
# jobs to 1440 minutes
# (e.g 24 hours on 1 core,
# 12 hours on 2 cores, etc.)
Limiting Usable CPU’s
sacctmgr modify user abc123 set GrpCPUs=2 # The user can only have 2 CPUs
# allocated at a time
Checking the Current Settings and Status
Check Account Limits and Fairshare
sacctmgr list assoc account=professor
Show historical Fairshare and Usage Information
sshare -a -l -A professor
Adjusting Priority
Slurm priority values are calculated by taking the sum of a variety of available factors, each an integer value multiplied by a number in the range 0-1.0. Some available factors include:
- Job size
- Queue time
- Fairshare
Calculating Fairshare
Fairshare is calculated with the following equation, taking values from
sshare -laA youraccount
FS = Norm Shares / Effectv Usage
where
Norm Shares = Raw Shares / sum(self + siblings' Raw Shares)
Effectv Usage = Raw Usage / account's Raw Usage
Modifying User Fairshare
sacctmgr modify user abc123 set fairshare=64