{"id":59,"date":"2018-01-16T19:48:14","date_gmt":"2018-01-16T19:48:14","guid":{"rendered":"https:\/\/in.nau.edu\/hpc\/?page_id=59"},"modified":"2024-08-13T14:47:48","modified_gmt":"2024-08-13T21:47:48","slug":"using-the-cluster-introduction","status":"publish","type":"page","link":"https:\/\/in.nau.edu\/arc\/overview\/using-the-cluster-introduction\/","title":{"rendered":"Using the Monsoon Cluster: Introduction"},"content":{"rendered":"<!-- shortcode-right-column -->\n<div class=\"shortcode-right-column\" >\n    <div class=\"shortcode-right-column__container\"><\/p>\n<p><!-- shortcode-contact -->\n<div class=\"shortcode-contact\">\n    <div class=\"contact-header\">\n        <h3>Contact Advanced Research Computing<\/h3>\n    <\/div>\n    <div class=\"contact-body\">\n                <a href=\"mailto:ask-arc@nau.edu\" aria-label=\"Contact Advanced Research Computing: Email Address\" title=\"Email Address\">\n            <div class=\"contact-icon-container\">\n                <i class=\"fas fa-envelope\" aria-hidden=\"true\"><\/i>\n                <span class=\"sr-only\">Email:<\/span>\n            <\/div>\n            <div class=\"contact-email\">ask-arc&#8203;@nau.edu<\/div>\n        <\/a>\n                    <\/div>\n<\/div>\n\n<\/p>\n<hr \/>\n<h2>Quick Links<\/h2>\n<ul>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/obtaining-an-account\/\">Request an Account<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/submitting-your-first-job\/\">Submitting your first job<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/request-storage\/\">Request storage<\/a><\/li>\n<li><a href=\"https:\/\/slurm.schedmd.com\/\">Slurm Documentation<\/a><\/li>\n<li><a href=\"https:\/\/slurm.schedmd.com\/rosetta.pdf\">Slurm Cheat Sheet<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/connecting-to-monsoon\/\">Connecting to Monsoon<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/file-management\/\">File Management<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/linux-bash-basics\/\">Linux\/Bash shell<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/using-the-cluster-advanced\/\">Using the Cluster: Advanced<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/faqs\/\">FAQs<\/a><\/li>\n<\/ul>\n<hr \/>\n<p><\/div>\n<\/div>\n\n<h1>Using the Monsoon Cluster: Introduction<\/h1>\n<p>We use a piece of software called\u00a0<a href=\"https:\/\/slurm.schedmd.com\/\">Slurm<\/a> for resource management and scheduling. Job priorities are determined by a number of factors, fairshare (most predominant) as well as age, partition, and size of the job.<\/p>\n<p><em>A <a href=\"https:\/\/slurm.schedmd.com\/rosetta.pdf\">Slurm cheat sheet<\/a> is available if you have used Slurm before<\/em>.<\/p>\n<h2>Viewing the Cluster Status<\/h2>\n<p>While logged in to one of Monsoon&#8217;s login nodes (wind or rain), you can inspect the state of the queues with the <span style=\"font-family: monospace;\">squeue<\/span> command:<\/p>\n<pre><code>[abc123@wind ~]$ squeue\r\n   JOBID PARTITION     NAME   USER ST        TIME NODES NODELIST(REASON)\r\n12345678      core demo_job def456 PD        0:00     1      (Resources)\r\n18765433      core demo_job ghi789 PD        0:00     1       (Priority)\r\n12398234      core demo_job jkl987  R 12-20:14:42     1              cn4<\/code><\/pre>\n<p>By default <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">squeue<\/span> lists both the running <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">R<\/span> and the pending queue <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">PD<\/span>.\u00a0 The jobs with an <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">R<\/span> in the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">ST<\/span> column are in the running state. The jobs with a <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">PD<\/span> in the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">ST<\/span> column are in the pending state.<\/p>\n<p>The <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">time<\/span> column lists how long the job has been running. You can see that there is one job that has been running for almost 13 days.<\/p>\n<p>It might appear that the cluster&#8217;s resources are mostly all allocated since there are jobs in the pending state, but this is not necessarily the case. It could be that the jobs in the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">PD<\/span> state are asking for more resources than are available on the cluster. To find out more info about the cluster state, use the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">sinfo<\/span> command.<\/p>\n<pre><code>[abc123@wind ~]$ sinfo\r\nPARTITION AVAIL   TIMELIMIT  NODES  STATE NODELIST\r\ncore         up  14-00:00:0      7    mix cn[4,7-8,11-14]\r\ncore         up  14-00:00:0      7    all cn[1-3,5-6,9-10,15]<\/code><\/pre>\n<p>This shows the partition of nodes defined in slurm, of which there is only one: <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">core<\/span>.\u00a0 Note that we can see that there are free cores (cpus) available as there are nodes in the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">mix<\/span> state. Nodes in the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">mix<\/span> state only have some of their cores currently allocated, whereas nodes that have all cores allocated will be in the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\">alloc<\/span> state.<\/p>\n<!-- shortcode-button -->\n<div class=\"shortcode-button shortcode-button--center\">\n      <a class=\"main-button\" href=\"https:\/\/in.nau.edu\/arc\/overview\/using-the-cluster-advanced\/\">View Advanced Topics<\/a>\n  <\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Using the Monsoon Cluster: Introduction We use a piece of software called\u00a0Slurm for resource management and scheduling. Job priorities are determined by a number of factors, fairshare (most predominant) as well as age, partition, and size of the job. A Slurm cheat sheet is available if you have used Slurm before. Viewing the Cluster Status [&hellip;]<\/p>\n","protected":false},"author":76,"featured_media":145,"parent":49,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","ring_central_script_selection":"","footnotes":""},"class_list":["post-59","page","type-page","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/59","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/users\/76"}],"replies":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/comments?post=59"}],"version-history":[{"count":11,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/59\/revisions"}],"predecessor-version":[{"id":3575,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/59\/revisions\/3575"}],"up":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/49"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/media\/145"}],"wp:attachment":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/media?parent=59"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}