{"id":49,"date":"2018-01-16T19:43:40","date_gmt":"2018-01-16T19:43:40","guid":{"rendered":"https:\/\/in.nau.edu\/hpc\/?page_id=49"},"modified":"2024-02-21T14:37:56","modified_gmt":"2024-02-21T21:37:56","slug":"overview","status":"publish","type":"page","link":"https:\/\/in.nau.edu\/arc\/overview\/","title":{"rendered":"Overview"},"content":{"rendered":"<h1>Overview<\/h1>\n<!-- shortcode-right-column -->\n<div class=\"shortcode-right-column\" >\n    <div class=\"shortcode-right-column__container\"><\/p>\n<p><!-- shortcode-contact -->\n<div class=\"shortcode-contact\">\n    <div class=\"contact-header\">\n        <h3>Contact Advanced Research Computing<\/h3>\n    <\/div>\n    <div class=\"contact-body\">\n                <a href=\"mailto:ask-arc@nau.edu\" aria-label=\"Contact Advanced Research Computing: Email Address\" title=\"Email Address\">\n            <div class=\"contact-icon-container\">\n                <i class=\"fas fa-envelope\" aria-hidden=\"true\"><\/i>\n                <span class=\"sr-only\">Email:<\/span>\n            <\/div>\n            <div class=\"contact-email\">ask-arc&#8203;@nau.edu<\/div>\n        <\/a>\n                    <\/div>\n<\/div>\n\n<\/p>\n<hr \/>\n<h2>Quick Links<\/h2>\n<ul>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/linux-resources\/\">Linux Resources<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/policies\/\">Policies<\/a><\/li>\n<li><a href=\"https:\/\/onbase.nau.edu\/appnet\/UnityForm.aspx?d1=AbEuP34v1%2f2XupEAyJdgZflsROKLoxs0VmNRA3DfNV4MnvnQqYsqW3i%2bre%2fztGYiGmQ7UqbBxiZMz7qUubCISJ2OgpXScXxekuVdICqUeI%2biM5FVq37TOsjvY80cqXSYIhBtlbOxpSa53L%2f8%2fGql6KmySIlDonQIbmY%2bWAQju8cW2U%2fqWMqs9SpIsj6RpCTJ7Kg79%2f9tgpQ19d%2bZKEcBLUk%3d\">Monsoon User Creation Form<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/arc\/documentation-page\/\">Documentation and Links<\/a><\/li>\n<li><a href=\"https:\/\/in.nau.edu\/hpc\/overview\/connecting-to-monsoon\/\">Connecting to Monsoon<\/a><\/li>\n<\/ul>\n<hr \/>\n<h4><\/h4>\n<p><\/div>\n<\/div>\n\n<p>The Advanced Research Computing (ARC) group facilitates access and support of High-Performance Computing (HPC) resources in general and specifically, monsoon. Cluster resources are available to NAU faculty and staff for use in their research projects, and for students who are sponsored by a research faculty. Sponsorship of students implies that the sponsor is responsible for ensuring acceptable use of cluster resources by the sponsored individual.<\/p>\n<p>Access to Monsoon can be obtained by submitting a <a href=\"https:\/\/in.nau.edu\/arc\/obtaining-an-account\/\">New Monsoon User Request form<\/a>. Once you have an account, you may <a href=\"https:\/\/in.nau.edu\/hpc\/overview\/connecting-to-monsoon\/\">login using ssh<\/a> with your NAU credentials.<\/p>\n<!-- shortcode-accordion -->\n<div class=\"shortcode-accordion shortcode-accordion--closed\" style=\"position: relative;\" >\n        <a class=\"shortcode-accordion__trigger\" data-header=\"Scheduler and resource manager_0\" href=\"#\">\n      <div class=\"shortcode-accordion__header\">\n          <h4>Scheduler and resource manager <span class=\"screen-reader-text\">Accordion Closed<\/span><\/h4>\n          <span class=\"shortcode-accordion__header__arrow\"><\/span>\n      <\/div>\n    <\/a>\n    <div class=\"shortcode-accordion__body\">\n        <!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD HTML 4.0 Transitional\/\/EN\" \"http:\/\/www.w3.org\/TR\/REC-html40\/loose.dtd\">\n<html><body>\n<p>The scheduling and resource management on Monsoon is handled by Slurm. Priorities are handled by a fairshare policy in addition to other factors like age, job size, etc. Here are the essential Slurm commands you&rsquo;ll want to know:<\/p>\n<style type=\"text\/css\">ul li pre{display:inline !important; color:gray;}<\/style>\n<ul>\n<li>squeue &ndash; inspect the queue, includes running and queued jobs<br>\nex: <i>squeue -t PD<\/i><br>\nex: <i>squeue -t R<\/i><\/li>\n<li>sinfo &ndash; inspect the cluster state including queues, nodes<br>\nex: <i>sinfo -1 -N<\/i><\/li>\n<li>sbatch &ndash; submit batch jobs<br>\nex: <i>sbatch &lt;job-script-file&gt;<\/i><\/li>\n<li>srun &ndash; submit parallel jobs<br>\nex: <i>srun &lt;job-command&gt;<\/i><\/li>\n<li>salloc &ndash; allocate resources for an interactive session<br>\nex: <i>salloc -N 2<\/i><\/li>\n<li>scontrol &ndash; for a user it can control, and inspect job state<br>\nex: <i>scontrol show &lt;jobid#&gt;<\/i><\/li>\n<li>scancel &ndash; cancel your jobs<br>\nex: <i>scancel &lt;jobid#&gt;<\/i><\/li>\n<\/ul>\n<p>The man pages are very well done, check them out for more info! For example, &ldquo;man sbatch&rdquo;.<\/p>\n<\/body><\/html>\n\n    <\/div>\n<\/div>\n\n<!-- shortcode-accordion -->\n<div class=\"shortcode-accordion shortcode-accordion--closed\" style=\"position: relative;\" >\n        <a class=\"shortcode-accordion__trigger\" data-header=\"Partitions_0\" href=\"#\">\n      <div class=\"shortcode-accordion__header\">\n          <h4>Partitions <span class=\"screen-reader-text\">Accordion Closed<\/span><\/h4>\n          <span class=\"shortcode-accordion__header__arrow\"><\/span>\n      <\/div>\n    <\/a>\n    <div class=\"shortcode-accordion__body\">\n        <!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD HTML 4.0 Transitional\/\/EN\" \"http:\/\/www.w3.org\/TR\/REC-html40\/loose.dtd\">\n<html><body>\n<p>core &ndash; includes all of the nodes, 14 day run limit<\/p>\n<\/body><\/html>\n\n    <\/div>\n<\/div>\n\n<!-- shortcode-accordion -->\n<div class=\"shortcode-accordion shortcode-accordion--closed\" style=\"position: relative;\" >\n        <a class=\"shortcode-accordion__trigger\" data-header=\"TRESRunMins_0\" href=\"#\">\n      <div class=\"shortcode-accordion__header\">\n          <h4>TRESRunMins <span class=\"screen-reader-text\">Accordion Closed<\/span><\/h4>\n          <span class=\"shortcode-accordion__header__arrow\"><\/span>\n      <\/div>\n    <\/a>\n    <div class=\"shortcode-accordion__body\">\n        <!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD HTML 4.0 Transitional\/\/EN\" \"http:\/\/www.w3.org\/TR\/REC-html40\/loose.dtd\">\n<html><body>\n<p>This is a variable in Slurm that represents a number assigned to an account (or qos) which limits the total number of remaining cpu minutes which your running jobs can occupy. Having this feature enabled on monsoon helps with:<\/p>\n<ul>\n<li>Flexible resource limiting<\/li>\n<li>Staggering jobs<\/li>\n<li>Increasing cluster utilization<\/li>\n<li>More accurate resource requests<\/li>\n<\/ul>\n<p>The current value for the limit is 3000000. This value is sometimes increased as cluster utilization drops to allow folks to use the idle cores. To calculate the TRESRunMins for your jobs, multiply the number of cpus being used by the time limit remaining, then multiply that number by the total number of jobs you are running<\/p>\n<p>tresrunmins = sumofjobs( cpus * time remaining )<\/p>\n<p>Examples:<\/p>\n<p>500000 = 24, 1cpu, 2 week jobs<\/p>\n<p>500000 = 49, 1 cpu, 1 week jobs<\/p>\n<p>500000 = 347, 1 cpu, 1 day jobs<\/p>\n<p>500000 = 21, 16 cpu, 1 day jobs<\/p>\n<p>500000 = 130, 16 cpu, 4 hr jobs<\/p>\n<p>500000 = 520, 4 cpu, 4 hr jobs<\/p>\n<p>To see the current TRESRunMins for a single account or all accounts, use<\/p>\n<p style=\"padding-left: 30px;\">sshare -l -A &lt;account name&gt; # single account<br>\nsshare -l # all accounts<\/p>\n<p>In the output, the pertinent column will be labeled CPURunMins and will be the farthest to the right. This number changes dynamically as jobs change state.<\/p>\n<\/body><\/html>\n\n    <\/div>\n<\/div>\n\n<!-- shortcode-accordion -->\n<div class=\"shortcode-accordion shortcode-accordion--closed\" style=\"position: relative;\" >\n        <a class=\"shortcode-accordion__trigger\" data-header=\"Storage_0\" href=\"#\">\n      <div class=\"shortcode-accordion__header\">\n          <h4>Storage <span class=\"screen-reader-text\">Accordion Closed<\/span><\/h4>\n          <span class=\"shortcode-accordion__header__arrow\"><\/span>\n      <\/div>\n    <\/a>\n    <div class=\"shortcode-accordion__body\">\n        <!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD HTML 4.0 Transitional\/\/EN\" \"http:\/\/www.w3.org\/TR\/REC-html40\/loose.dtd\">\n<html><body>\n<p>\/common<\/p>\n<ul>\n<li>This is the cluster &ldquo;common&rdquo; area.<\/li>\n<li>Cluster dependencies reside here as well as areas for users to share contributions (contrib)<\/li>\n<\/ul>\n<p>\/scratch<\/p>\n<ul>\n<li>450TB<\/li>\n<li>This is the primary shared working storage<\/li>\n<li>Write\/read your temporary files, logs, and final products here<\/li>\n<li>30 day retention period on files, emails sent at 28 days for warning &ndash; no quotas<\/li>\n<\/ul>\n<p>\/projects<\/p>\n<ul>\n<li>500TB<\/li>\n<li>5TB of free storage per faculty member (more can be purchased)<\/li>\n<li>This is a long-term storage solution<\/li>\n<li>Built on ZFS for enterprise-grade data integrity, scale, and performance<\/li>\n<li>30 Gbps throughput<\/li>\n<\/ul>\n<p>\/packages<\/p>\n<ul>\n<li>Packages and modules<\/li>\n<\/ul>\n<p>\/home<\/p>\n<ul>\n<li>Keep scripts and other small files here<\/li>\n<li>This area is not meant for heavy writes like temp files, logs, and checkpoint files<\/li>\n<li>This area is snapshotted twice a a day and a total of 4 snapshots are kept here: \/home\/.snapshot<\/li>\n<li>10G quota<\/li>\n<\/ul>\n<p>\/tmp<\/p>\n<p>120GB &ndash; Local node storage<br>\nAll storage areas are available around campus off of monsoon via SMB by visiting \\\\shares.hpc.nau.edu\\cirrus.<\/p>\n<\/body><\/html>\n\n    <\/div>\n<\/div>\n\n<!-- shortcode-accordion -->\n<div class=\"shortcode-accordion shortcode-accordion--closed\" style=\"position: relative;\" >\n        <a class=\"shortcode-accordion__trigger\" data-header=\"Configured modules_0\" href=\"#\">\n      <div class=\"shortcode-accordion__header\">\n          <h4>Configured modules <span class=\"screen-reader-text\">Accordion Closed<\/span><\/h4>\n          <span class=\"shortcode-accordion__header__arrow\"><\/span>\n      <\/div>\n    <\/a>\n    <div class=\"shortcode-accordion__body\">\n        <!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD HTML 4.0 Transitional\/\/EN\" \"http:\/\/www.w3.org\/TR\/REC-html40\/loose.dtd\">\n<html><body>\n<p>&ndash; List the available modules on Monsoon: &ldquo;module avail&rdquo;<\/p>\n<p>&ndash; List the currently loaded modules in your login session: &ldquo;module list&rdquo;<\/p>\n<p>&ndash; Load a module: &ldquo;module load &lt;module&gt;&rdquo;<\/p>\n<\/body><\/html>\n\n    <\/div>\n<\/div>\n\n<!-- shortcode-button -->\n<div class=\"shortcode-button shortcode-button--center\">\n      <a class=\"main-button\" href=\"https:\/\/in.nau.edu\/arc\/documentation-page\/\">View Additional Documentation<\/a>\n  <\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Overview The Advanced Research Computing (ARC) group facilitates access and support of High-Performance Computing (HPC) resources in general and specifically, monsoon. Cluster resources are available to NAU faculty and staff for use in their research projects, and for students who are sponsored by a research faculty. Sponsorship of students implies that the sponsor is responsible [&hellip;]<\/p>\n","protected":false},"author":465,"featured_media":145,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","ring_central_script_selection":"","footnotes":""},"class_list":["post-49","page","type-page","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/49","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/users\/465"}],"replies":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/comments?post=49"}],"version-history":[{"count":25,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/49\/revisions"}],"predecessor-version":[{"id":3352,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/pages\/49\/revisions\/3352"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/media\/145"}],"wp:attachment":[{"href":"https:\/\/in.nau.edu\/arc\/wp-json\/wp\/v2\/media?parent=49"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}