{"id":1014,"date":"2019-01-04T11:26:30","date_gmt":"2019-01-04T18:26:30","guid":{"rendered":"https:\/\/in.nau.edu\/hpc\/?page_id=1014"},"modified":"2024-08-13T14:32:31","modified_gmt":"2024-08-13T21:32:31","slug":"slurm-coord","status":"publish","type":"page","link":"https:\/\/in.nau.edu\/arc\/slurm-coord\/","title":{"rendered":"Using Slurm"},"content":{"rendered":"<h1 id=\"coordinating-accounts-with-slurm\">Coordinating Accounts with Slurm<\/h1>\n<p>NAU\u2019s Monsoon cluster is host to several research groups. In order to balance the demands of these groups, Slurm is utilized to schedule jobs in a way to maximise fairness. Slurm provides a useful overlay to make starting large compute jobs easy.<\/p>\n<h2 id=\"managing-jobs\">Managing Jobs<\/h2>\n<h3 id=\"listing-jobs\">Listing jobs<\/h3>\n<pre><code>$ squeue -A professor # List by account\r\n$ squeue -u abc123    # List by user<\/code><\/pre>\n<h3 id=\"cancelling-jobs\">Cancelling Jobs<\/h3>\n<pre><code>$ scancel 12345678                   # Cancel by job ID\r\n$ scancel -u abc123                  # Cancel all of a user's jobs\r\n$ scancel -u abc123 --state=running  # Cancel all of a user's RUNNING jobs\r\n$ scancel -u abc123 --state=pending  # Cancel all of a user's PENDING jobs\r\n$ scancel -A professor               # Cancel an entire account's jobs<\/code><\/pre>\n<h3 id=\"holding-and-releasing-jobs\">Holding and Releasing Jobs<\/h3>\n<pre><code>$ scontrol hold 12345678      # Hold by job ID\r\n$ scontrol release 12345678   # Release the hold\r\n$ scontrol uhold 12345678     # Hold job 12345678 but allow the job's owner to \r\n                              # release it<\/code><\/pre>\n<h2 id=\"limiting-users\">Limiting Users<\/h2>\n<h3 id=\"check-the-current-limits\">Check the Current Limits<\/h3>\n<pre><code>$ sacctmgr list assoc account=professor\r\n$ sacctmgr list assoc user=abc123 format=account,user,grpcpurunmins\r\n$ sacctmgr list assoc user=abc123<\/code><\/pre>\n<h3 id=\"limiting-cpu-time\">Limiting CPU Time<\/h3>\n<pre><code>$ sacctmgr modify user abc123 set GrpCPURunMins=1440  # Limit a user's maximum CPU \r\n                                                      # time in pending\/running \r\n                                                      # jobs to 1440 minutes \r\n                                                      # (e.g 24 hours on 1 core, \r\n                                                      #  12 hours on 2 cores, etc.)<\/code><\/pre>\n<h3 id=\"limiting-usable-cpus\">Limiting Usable CPU\u2019s<\/h3>\n<pre><code>$ sacctmgr modify user abc123 set GrpCPUs=2 # The user can only have 2 CPUs \r\n                                            # allocated at a time<\/code><\/pre>\n<h2 id=\"checking-the-current-settings-and-status\">Checking the Current Settings and Status<\/h2>\n<h3 id=\"check-account-limits-and-fairshare\">Adding a Student to a SLURM Account<\/h3>\n<pre><code>$ sacctmgr add user name=abc123 account=professor                       # Add user to account\r\n$ sacctmgr modify user where name=abc123 set defaultaccount=professor   # Set user's default account\r\n$ sacctmgr modify user where name=abc123 set defaultqos=professor       # Set user's Quality of Service (QoS)\r\n$ sacctmgr update user name=abc123 account=professor set fairshare=128  # Set user's fairshare value<\/code><\/pre>\n<h3 id=\"check-account-limits-and-fairshare\">Check Account Limits and Fairshare<\/h3>\n<pre><code>$ sacctmgr list assoc account=professor<\/code><\/pre>\n<h3 id=\"show-historical-fairshare-and-usage-information\">Show historical Fairshare and Usage Information<\/h3>\n<pre><code>$ sshare -a -l -A professor<\/code><\/pre>\n<h2 id=\"adjusting-priority\">Adjusting Priority<\/h2>\n<p>Slurm priority values are calculated by taking the sum of a variety of available factors, each an integer value multiplied by a number in the range 0-1.0. Some available factors include:<\/p>\n<ul>\n<li>Job size<\/li>\n<li>Queue time<\/li>\n<li>Fairshare<\/li>\n<\/ul>\n<h3 id=\"calculating-fairshare\">Calculating Fairshare<\/h3>\n<p>Fairshare is calculated with the following equation, taking values from<\/p>\n<p><code>$ sshare -laA youraccount<\/code><br \/>\nFrom that data, perform the following calculations:<\/p>\n<pre><code>Norm Shares = Raw Shares \/ sum(self + siblings' Raw Shares)\r\nEffectv Usage = Raw Usage \/ account's Raw Usage\r\nFairShare = Norm Shares \/ Effectv Usage\r\n<\/code><\/pre>\n<h3 id=\"modifying-user-fairshare\">Modifying User Fairshare<\/h3>\n<pre><code>$ sacctmgr modify user abc123 set fairshare=64<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Coordinating Accounts with Slurm NAU\u2019s Monsoon cluster is host to several research groups. In order to balance the demands of these groups, Slurm is utilized to schedule jobs in a way to maximise fairness. Slurm provides a useful overlay to make starting large compute jobs easy. Managing Jobs Listing jobs $ squeue -A professor # [&hellip;]<\/p>\n","protected":false},"author":465,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","ring_central_script_selection":"","footnotes":""},"class_list":["post-1014","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/1014","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/users\/465"}],"replies":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/comments?post=1014"}],"version-history":[{"count":4,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/1014\/revisions"}],"predecessor-version":[{"id":3571,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/1014\/revisions\/3571"}],"wp:attachment":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/media?parent=1014"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}