Advanced Research Computing
Virtual Visit Request info Apply
MENUMENU
  • About
    • Overview
    • Details
    • Policies
    • FAQs
    • Our Team
    • Testimonials
  • Services
    • Pricing
    • Office Hours
    • Service Requests
      • Request an Account
      • Request Storage
      • Request Classroom Access
      • Request Software
    • Data Portal »
  • Resources
    • Documentation »
    • Workshops
    • Web Apps
      • Doppler (NAU only)
      • Metrics (NAU only)
      • OnDemand
      • XDMod
      • XDMoD Reports
  • Research
    • Current Projects
    • Publications
  • Collaboration
    • CRN
    • External
  • IN
  • ARC
  • Using the Cluster: Introduction

Quicklinks

  • Submitting your first job
  • Request storage
  • Using the cluster: advanced
  • FAQs

Using the cluster: introduction

  • Obtain account
  • Connecting to Monsoon
  • File management
  • Linux /Bash shell

We use a piece of software called Slurm for resource management and scheduling.  Job priorities are determined by a number of factors, fairshare (most predominant) as well as age, partition, and size of the job.

If you have previous cluster experience but not with slurm, see this document for quick conversions.

Checking the status of the cluster

While logged in to one of Monsoon’s login nodes (wind or rain), you can inspect the state of the queues with the “squeue” command:

By default “squeue” lists both the running (R) and the pending queue (PD).  The jobs with an “R” in the “ST” column are in the running state. The jobs with a “PD” in the “ST” column are in the pending state.

The “time” column lists how long the job has been running. You can see that there are four jobs that have been running for almost 13 days.

It might appear that the cluster’s resources are mostly all allocated since there are jobs in the pending state, but this is not necessarily the case. It could be that the jobs in the PD state are asking for more resources than are available on the cluster. To find out more info about the cluster state, use the “sinfo” command.

[ abc123@wind ~ ]$ sinfo
PARTITION AVAIL   TIMELIMIT  NODES  STATE NODELIST
core         up  14-00:00:0      7    mix cn[4,7-8,11-14]
core         up  14-00:00:0      7    all cn[1-3,5-6,9-10,15]

This shows the partition of nodes defined in slurm, of which there is only one: “core”.  Note that we can see that there are free cores (cpus) available as there are nodes in the “mix” state. Nodes in the mix state only have some of their cores currently allocated, whereas nodes that have all cores allocated will be in the “alloc” state.