{"id":3680,"date":"2024-10-16T12:07:29","date_gmt":"2024-10-16T19:07:29","guid":{"rendered":"https:\/\/in.nau.edu\/arc\/?page_id=3680"},"modified":"2025-01-03T15:10:32","modified_gmt":"2025-01-03T22:10:32","slug":"gpus","status":"publish","type":"page","link":"https:\/\/in.nau.edu\/arc\/gpus\/","title":{"rendered":"GPUs on Monsoon"},"content":{"rendered":"<!-- shortcode-right-column -->\n<div class=\"shortcode-right-column\" >\n    <div class=\"shortcode-right-column__container\"><\/p>\n<p><!-- shortcode-contact -->\n<div class=\"shortcode-contact\">\n    <div class=\"contact-header\">\n        <h3>Contact Advanced Research Computing<\/h3>\n    <\/div>\n    <div class=\"contact-body\">\n                <a href=\"mailto:ask-arc@nau.edu\" aria-label=\"Contact Advanced Research Computing: Email Address\" title=\"Email Address\">\n            <div class=\"contact-icon-container\">\n                <i class=\"fas fa-envelope\" aria-hidden=\"true\"><\/i>\n                <span class=\"sr-only\">Email:<\/span>\n            <\/div>\n            <div class=\"contact-email\">ask-arc&#8203;@nau.edu<\/div>\n        <\/a>\n                    <\/div>\n<\/div>\n\n<\/p>\n<p><!-- shortcode-button -->\n<div class=\"shortcode-button shortcode-button--center\">\n      <a class=\"main-button\" href=\"https:\/\/in.nau.edu\/arc\/meeting-room\/\">View Microsoft Teams Meeting Information<\/a>\n  <\/div>\n<\/p>\n<p><\/div>\n<\/div>\n\n<h1 id=\"gpus-on-monsoon\">GPUs on Monsoon<\/h1>\n<p>For some of your jobs on Monsoon, you might require the use of one or more Graphics Processing Units (GPUs) in order to accelerate your jobs. By using Slurm on Monsoon, you can easily request a GPU for your job.<\/p>\n<div>\n<p dir=\"auto\"><em>Note: At the time of writing this article, Monsoon only contains NVIDIA GPUs.<\/em><\/p>\n<h2 id=\"checking-available-gpus\" dir=\"auto\" data-heading=\"Checking Available GPUs\">Checking Available GPUs<\/h2>\n<p dir=\"auto\">To get a list of all GPUs on Monsoon along with what GPUs are available for use, you can use the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">gpu_status<\/span> command:<\/p>\n<pre class=\"language-bash\" tabindex=\"0\"><code class=\"language-bash is-loaded\">$ gpu_status\r\nAvailable GPUs: <span class=\"token number\">10<\/span>\/24\r\nk80: <span class=\"token number\">7<\/span>\/12\r\np100: <span class=\"token number\">0<\/span>\/4\r\nv100: <span class=\"token number\">3<\/span>\/4\r\na100: <span class=\"token number\">0<\/span>\/4\r\nPending GPU jobs: <span class=\"token number\">57<\/span>\r\nRunning GPU jobs: <span class=\"token number\">5<\/span>\r\n<\/code><\/pre>\n<p dir=\"auto\"><em>Note: If you are running deep learning software, we recommend using the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">a100<\/span> GPUs. If those are unavailable, try using either the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">p100<\/span> or the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">v100<\/span> GPUs instead.<\/em><\/p>\n<p><em>Note: Some GPUs may be shown as available, but cannot be requested. This is because some nodes are dedicated to specific research groups.<\/em><\/p>\n<h2 id=\"submitting-a-gpu-job\" dir=\"auto\" data-heading=\"Submitting a GPU Job\">Submitting a GPU Job<\/h2>\n<p dir=\"auto\">To quickly request a GPU of any model, you can use either the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">-G<\/span>\u00a0or <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">\u2010\u2010gpus=<\/span> flags in either your <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">srun<\/span> or <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">salloc<\/span> commands:<\/p>\n<pre class=\"language-bash\" tabindex=\"0\"><code class=\"language-bash is-loaded\"><span class=\"token comment\"># Both of these do the same thing!<\/span>\r\n$ srun -G <span class=\"token number\">1<\/span> nvidia-smi\r\n$ srun --gpus<span class=\"token operator\">=<\/span><span class=\"token number\">1<\/span> nvidia-smi\r\n<\/code><\/pre>\n<pre class=\"language-bash\" tabindex=\"0\"><code class=\"language-bash is-loaded\">$ salloc --gpus<span class=\"token operator\">=<\/span><span class=\"token number\">1<\/span>\r\n$ srun nvidia-smi\r\n<\/code><\/pre>\n<p dir=\"auto\">Or, if you want to add it to your SBATCH script, you can add the following line where the rest of your SBATCH parameters are:<\/p>\n<pre class=\"language-bash\" tabindex=\"0\"><code class=\"language-bash is-loaded\"><span class=\"token comment\">#SBATCH --gpus=1<\/span>\r\nnvidia-smi\r\n<\/code><\/pre>\n<p dir=\"auto\">If you want to request a specific GPU model, you can specify that in the same argument:<\/p>\n<pre class=\"language-bash\" tabindex=\"0\"><code class=\"language-bash is-loaded\">$ srun -G k80 nvidia-smi\r\n$ srun -G k80:4 nvidia-smi    <span class=\"token comment\"># requests four k80 GPUs<\/span>\r\n<\/code><\/pre>\n<pre class=\"language-bash\" tabindex=\"0\"><code class=\"language-bash is-loaded\"><span class=\"token comment\">#SBATCH --gpus=a100<\/span>\r\nnvidia-smi                    # the program you want to run\r\n<\/code><\/pre>\n<p>Another way to request a specific GPU model would be to use the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">-C<\/span> flag:<\/p>\n<pre><code>$ srun -C k80 -G 4 nvidia-smi<\/code><\/pre>\n<h3 id=\"relevant-flags\" dir=\"auto\" data-heading=\"Relevant Flags\">Relevant Flags<\/h3>\n<p dir=\"auto\">There are additional flags you can provide Slurm to fine-tune your jobs, but they may not be useful for everyone.<\/p>\n<ul>\n<li dir=\"auto\"><span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">\u2010\u2010cpus-per-gpu=[int]<\/span>: Specify how many CPUs to allocate for every GPU requested.<\/li>\n<li dir=\"auto\"><span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">\u2010\u2010mem-per-gpu=[memory]<\/span>: Specify how much memory to allocate for every GPU requested. This is most often used when using multiple GPUs.<\/li>\n<li dir=\"auto\"><span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">\u2010\u2010ntasks-per-gpu=[int]<\/span>: Specify how many tasks to run for every GPU requested. This option should be used by advanced users only.<\/li>\n<li dir=\"auto\"><span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">\u2010\u2010gpus-per-task=[model]:[int]<\/span>: Specify how many GPUs to allocate for every task being ran. This option should be used by advanced users only.<\/li>\n<li dir=\"auto\"><span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">\u2010\u2010gpus-per-node=[model]:[int]<\/span>: Specify how many GPUs to allocate for each node being requested. This option should be used by advanced users only.<\/li>\n<li dir=\"auto\"><span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">\u2010\u2010gpus-per-socket=[model]:[int]<\/span>: Specify how many GPUs to allocate for each CPU socket being used. This option should be used by advanced users only.<\/li>\n<\/ul>\n<\/div>\n<h2 id=\"checking-gpu-usage\">Checking GPU Usage<\/h2>\n<p>To ensure that you are getting the most out of your allocated GPU(s), checking the GPU usage is a useful tool. This can be monitored on our <a href=\"https:\/\/metrics.hpc.nau.edu\/\">Monsoon Metrics<\/a> (requires a connection to NAU WiFi or <a href=\"https:\/\/in.nau.edu\/its\/remote-services\/\">VPN<\/a>) page or via CLI with the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">nvidia-smi dmon<\/span> command <strong>while an SSH connection to the GPU node is active<\/strong>:<\/p>\n<pre><code>$ nvidia-smi dmon<\/code><\/pre>\n<p><em>Note: This command constantly monitors the GPU usage statistics, and will not stop until you enter the keyboard shortcut <strong>Control + C<\/strong>.<\/em><\/p>\n<h2 id=\"using-cuda\">Using CUDA<\/h2>\n<p>Some of your jobs on Monsoon may require the use of NVIDIA&#8217;s Compute Unified Device Architecture (CUDA) toolkit. In order to use the CUDA toolkit libraries, you need to load the <span style=\"font-size: 16px; font-family: monospace; border: 1px solid; border-radius: 4px; padding: 0px 4px 0px; border-color: #BBBBBB;\" data-darkreader-inline-border-top=\"\" data-darkreader-inline-border-right=\"\" data-darkreader-inline-border-bottom=\"\" data-darkreader-inline-border-left=\"\">cuda<\/span> module:<\/p>\n<pre><code>$ module load cuda<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>GPUs on Monsoon For some of your jobs on Monsoon, you might require the use of one or more Graphics Processing Units (GPUs) in order to accelerate your jobs. By using Slurm on Monsoon, you can easily request a GPU for your job. Note: At the time of writing this article, Monsoon only contains NVIDIA [&hellip;]<\/p>\n","protected":false},"author":2758,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","ring_central_script_selection":"","footnotes":""},"class_list":["post-3680","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/3680","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/users\/2758"}],"replies":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/comments?post=3680"}],"version-history":[{"count":16,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/3680\/revisions"}],"predecessor-version":[{"id":3743,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/3680\/revisions\/3743"}],"wp:attachment":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/media?parent=3680"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}