Parallelism in Python
Because of Python’s Global Interpreter Lock (GIL), the threads within each python process cannot truly run in parallel, unlike threads in other programming languages such as Java, C/C++, and Go. For parallelism you have to create multiple processes, for this python comes with the multiprocessing module. Also note that Python’s modules are often written in C and have efficient multithreading, such as numpy.
Multiprocessing Example:
Note: To follow with these examples, both the jobscript and the python script will be used together.
Sample jobscript:
#!/bin/bash
#SBATCH --job-name=multiproc-test
#SBATCH --output=multiproc-output.txt
#SBATCH -c 4
#SBATCH --time=15:00
#SBATCH --mem=100M
srun ./multi_proc.py 40 1
Sample Python script:
#!/usr/bin/env python3
import multiprocessing
import os
import sys
def call_stress(mins):
os.system(f"timeout {mins}m stress -c 1")
if __name__ == '__main__':
sub_tasks = int(sys.argv[1])
mins = int(sys.argv[2])
cores = int(os.environ['SLURM_CPUS_PER_TASK'])
with multiprocessing.Pool(cores) as pool:
pool.map(call_stress,
[mins for i in range(sub_tasks)])
To execute this code, you would have to copy the job script into a file named multi_proc.sh, the python script into multi_proc.py, then run:
[abc123@wind ~]$ sbatch ./multi_proc.sh
In most situations you need to pass the number of CPUs per task to the Pool constructor. This argument tells the Pool object how many “worker processes” to create. If I create more processes than I have CPUs then the processes will be fighting for resources. By default the Pool object would create as many processes as there are CPUs on the system, which may not line up with your job allocation.
You’ll also want to prevent these processes from spawning tons of threads. The numpy library uses multithreading by default, and so parallelizing a python function that uses numpy may create a huge number of threads. If the number of running threads exceeds the number of cores this could bottleneck important system processes on our compute nodes. By default you should set these four environment variables in your job script, before executing any python code that uses the multiprocessing module.
[abc123@wind ~]$ export OMP_NUM_THREADS=1
[abc123@wind ~]$ export MPI_NUM_THREADS=1
[abc123@wind ~]$ export MKL_NUM_THREADS=1
[abc123@wind ~]$ export OPENBLAS_NUM_THREADS=1
These first two environment variables tell any OpenMP code to use one thread (per process). The second two environment variables tell numpy to use one thread (per process).
The python multiprocessing library has a great range of features, to learn more refer to the official documentation.