Advanced Research Computing
Virtual Visit Request info Apply
MENUMENU
  • About
    • Overview
    • Details
    • Terms of Service
    • FAQs
    • Our Team
    • Testimonials
  • Services
    • Coffee/Office Hours
    • Data Portal »
    • Pricing
    • Secure Computing
    • Service Requests
      • Request an Account
      • Request Classroom Access
      • Request Data Science Development/Consulting
      • Request Software
      • Request Storage
  • Resources
    • Documentation »
    • Workshops
    • Web Apps
      • Doppler (NAU only)
      • Metrics (NAU only)
      • OnDemand
      • XDMod
      • XDMoD Reports
  • Research
    • Current Projects
    • Publications
  • Collaboration
    • Arizona Research Computing
    • CRN
    • External
  • IN
  • ARC
  • Using the Monsoon Cluster: Introduction

Contact Advanced Research Computing

Email:
ask-arc​@nau.edu

Quick Links

  • Request an Account
  • Submitting your first job
  • Request storage
  • Slurm Documentation
  • Slurm Cheat Sheet
  • Connecting to Monsoon
  • File Management
  • Linux/Bash shell
  • Using the Cluster: Advanced
  • FAQs

Using the Monsoon Cluster: Introduction

We use a piece of software called Slurm for resource management and scheduling. Job priorities are determined by a number of factors, fairshare (most predominant) as well as age, partition, and size of the job.

A Slurm cheat sheet is available if you have used Slurm before.

Viewing the Cluster Status

While logged in to one of Monsoon’s login nodes (wind or rain), you can inspect the state of the queues with the squeue command:

[abc123@wind ~]$ squeue
   JOBID PARTITION     NAME   USER ST        TIME NODES NODELIST(REASON)
12345678      core demo_job def456 PD        0:00     1      (Resources)
18765433      core demo_job ghi789 PD        0:00     1       (Priority)
12398234      core demo_job jkl987  R 12-20:14:42     1              cn4

By default squeue lists both the running R and the pending queue PD.  The jobs with an R in the ST column are in the running state. The jobs with a PD in the ST column are in the pending state.

The time column lists how long the job has been running. You can see that there is one job that has been running for almost 13 days.

It might appear that the cluster’s resources are mostly all allocated since there are jobs in the pending state, but this is not necessarily the case. It could be that the jobs in the PD state are asking for more resources than are available on the cluster. To find out more info about the cluster state, use the sinfo command.

[abc123@wind ~]$ sinfo
PARTITION AVAIL   TIMELIMIT  NODES  STATE NODELIST
core         up  14-00:00:0      7    mix cn[4,7-8,11-14]
core         up  14-00:00:0      7    all cn[1-3,5-6,9-10,15]

This shows the partition of nodes defined in slurm, of which there is only one: core.  Note that we can see that there are free cores (cpus) available as there are nodes in the mix state. Nodes in the mix state only have some of their cores currently allocated, whereas nodes that have all cores allocated will be in the alloc state.

View Advanced Topics