Job Arrays
Job arrays allow you to leverage SLURM’s ability to create multiple jobs from one script. Many of the situations where this is useful include:
- Establishing a list of commands to run and have a job created from each command in the list.
- Running many parameters against one set of data or analysis program.
- Running the same program multiple times with different sets of data.
In these cases, as we have learned thus far, we would have to manually rerun the sbatch command multiple times for each of the aforementioned scenarios. Fortunately, SLURM allows us to automate this procedure using job arrays. Each array is considered to be one “array job” that has a specific ID. Each element of the array is one array task, which has it’s own sub-ID. For example, if your “array job” ID was 1212985, your first “array task” that runs would have an ID of 1212985_0.
NOTE:
Take note that if you use these examples that you will need to replace NAUID with your own id, and that you will need to take out the line numbers. Also, we use a directory called playground and you will need to create one as well or it will not run.
Examples
Iteratively named input/reference files
If your input files are numerically sequenced, each individual (sub-)job in the array read a single, unique input file: one array (sub-)job for each input file. (E.g.: a job-array set to range from 1-4 would utilize one file, from file-1 through file-4, over its total of 4 sub-jobs.)
To set-up this type of job-array, a researcher could set the start-index/end-index of the job-array to match the iterating part of the input files; or they could sequentially rename existing input files. Then, by using $SLURM_ARRAY_TASK_ID as part of a filename, each sub-job could operate on its own separate input file. For the 4-files example:
$ ls -1 ~/data
sample-1.dat
sample-2.dat
sample-3.dat
sample-4.dat
The job-script for the overall array-job would then just run whatever number-crunching tasks were necessary on a derived filename:
#SBATCH --array=1-4
taskfile="~/data/sample-$SLURM_ARRAY_TASK_ID.dat"
srun analyze_datafile "$taskfile"
srun process_datafile "$taskfile"
Irregularly named input/reference files
If necessary input files aren’t sequentially named, it’s still quite easy to set-up a job-array to iterate through the entire set, with just a bit of basic shell-scripting.
Helpful BASH scripting tips:
- To save the output of a command to a BASH variable, use this command substitution construction:
variable_name=$() - To use one command’s output as the input of a second command, use this pipe redirect construction:
command_1 | command_2 - To “capture” the N’th line of input using awk, use awk’s number-of-row selector:
<piped_input> | awk “NR==#”
or
awk “NR==#” <input_file>
$ ls -1 ~/data/
sample_2021_01.dat
sample_2021_07.dat
sample_2022_01.dat
sample_2022_07.dat
$ filelist=$( ls -1 ~/data/ )
$ echo "$filelist" | awk "NR==1"
sample_2021_01.dat
$ echo "$filelist" | awk "NR==2"
sample_2021_07.dat
$ echo "$filelist" | awk "NR==3"
sample_2022_01.dat
$ echo "$filelist" | awk "NR==4"
sample_2022_07.dat
Again, the job-script for the overall array-job would then just run whatever number-crunching tasks were necessary on a derived filename:
#SBATCH --array=1-4
filelist=$( ls -1 ~/data/ )
taskfile=$( echo "$filelist" | awk "NR==$SLURM_ARRAY_TASK_ID" )
srun analyze_datafile "$taskfile"
srun process_datafile "$taskfile"
Single input source with varying parameters
$ cat ~/coords.txt
35.19N,111.6W
33.57N,112.1W
32.15N,110.9W
$ cat ~/coords.txt | awk "NR==1"
35.19N,111.6W
$ cat ~/coords.txt | awk "NR==2"
33.57N,112.1W
$ cat ~/coords.txt | awk "NR==3"
32.15N,110.9W
$ taskcoord=$(cat ~/coords.txt | awk "NR==$SLURM_ARRAY_TASK_ID")
Old Examples
Exercise 1: Numbered Files Accordion Closed
There may be times that you would like to send many files as input to a program. Instead of having to do this one at a time, you can set up a job array to do this automatically. In this next example, we will be using a shell script called exercise1.sh that takes input files that are numbered and puts the data into an output file.
Exercise1.sh
#!/bin/bash
#SBATCH --job-name=exercise1
#SBATCH --time=00:01:00
#SBATCH --mem=1
#SBATCH --input=/scratch/NAUID/playground/files/input_%a.csv
#SBATCH --output=/scratch/NAUID/playground/output_%a.txt
#SBATCH --chdir=/scratch/NAUID/playground/files
#SBATCH --array=1-3
filename="input_${SLURM_ARRAY_TASK_ID}.csv"
srun echo "File: $filename"
srun cat $filename
In this example there are two lines that we use which are very important to be able to access the files. In line 6 we specify the input file. Since our files are labeled in a formatting of input_x.csv, then we can specify our input files using input_%a.csv. The %a specification denotes the job array ID (index) number. In line 12 we use a variable called SLURM_ARRAY_TASK_ID. This variable is similar to %a and is used to denote the usage of the job array ID (index) number.
Exercise 2: Non-numbered Files Accordion Closed
Similar to example 1, in this example we will be taking in different input files from a specified directory. Instead of reading the files that are labeled with sequential numbers, we will be reading files that are labeled nonnumerical titles using a file list. In this next example, we will be using a shell script called exercise2.sh that takes input files from a directory that are nonnumerical and puts the data into an output file.
File_list.txt
avondale.csv
flagstaff.csv
gilbert.csv
goodyear.csv
phoenix.csv
williams.csv
Exercise2.sh #!/bin/bash #SBATCH --job-name=exercise2 #SBATCH --time=00:01:00 #SBATCH --mem=1 #SBATCH --output=/scratch/NAUID/playground/output_%a #SBATCH --chdir=/scratch/NAUID/playground/ #SBATCH --array=1-6 file=$(awk "NR==${SLURM_ARRAY_TASK_ID}" file_list.txt) srun echo "Data from $file: " srun cat ./cityfiles/$file
In this example there is one line that makes this different from example 1. In line 10 we are using the awk command. We are using awk to to read the file line by line, where NR is equivalent to the job array ID (index) number and NR is equivalent to the number of records seen so far (line number). Line 12 is then using the file name that we got from line 10 and writing the data to our output file.
Exercise 3: Running Multiple Programs Against Data Accordion Closed
Taking what we learned in example 2, we can take the input files and run them against multiple programs using a command list. This can be used in the chance that you need to take your data files and run them against multiple programs. In the example below, we are taking our input files and running them against the following scripts: add_numbers, multiply_numbers, and subtract_numbers. In the programs we are using we pass through three numbers and compute calculations.
command_listadd_numbers multiply_numbers subtract_numbers
add_numbers#!/bin/env python3 import sys data=sys.argv[1] data=data.split(",") num1=data[0] num2=data[1] num3=data[2] total=int(num1)+int(num2)+int(num3) print(total)
multiply_numbers#!/bin/env python3 import sys data=sys.argv[1] data=data.split(",") num1=data[0] num2=data[1] num3=data[2] total=int(num1)*int(num2)*int(num3) print(total)
subtract_numbers#!/bin/env python3 import sys data=sys.argv[1] data=data.split(",") num1=data[0] num2=data[1] num3=data[2] total=int(num1)-int(num2)-int(num3) print(total)
Exercise3.sh#!/bin/bash #SBATCH --job-name=exercise3 #SBATCH --time=00:01:00 #SBATCH --mem=1 #SBATCH --output=/scratch/NAUID/playground/output_%a #SBATCH --chdir=/scratch/NAUID/playground #SBATCH --array=1-6 module load python/3.latest file=$(awk "NR==${SLURM_ARRAY_TASK_ID}" file_list.txt) srun echo "File: $file" for line in `cat command_list`; do echo "Script: $line" for data in `cat ./cityfiles/$file`; do srun python $line $data done done
In the example above, similar to example 2 line 12 is obtaining the file names from the file_list. From there, starting at line 17 we have started a loop that gets each line of the command_list and grabs the program to run. Then line 20 grabs each line of data in our input files and passes it to the program as seen in line 21. Once the job runs you will see output files like the one seen below.
Output_1
File: avondale.csv
Script: add_numbers
369
639
1893
Script: multiply_numbers
1860867
9663597
251239591
Script: subtract_numbers
-123
-213
-631
Referenced Files Accordion Closed
Input_1.csvx1,x2,x3 123,345,12 213,230,127 543,345,209 631,183,324 345,789,901
Avondale.csv123,345,12 213,230,127 631,183,324