Submitting Jobs
Introduction
Raad2 is running the SLURM workload manager for job scheduling. Functionally it is similar to the PBS workload manager users may be familiar with on raad, but it uses a different syntax to formulate resource requests and specify scheduling and placement needs. Note that whereas PBS referred to "queues" SLURM uses the term "partitions" to refer to the same concept.
Partition configuration and resource limits
Partition | QOS | Per Partition CPU limit | Per User CPU limit | Per Job CPU limit | Max WallTime |
---|---|---|---|---|---|
s_short | ss | 144 | 24 | 8 | 08:00:00 |
s_long | sl | 456 | 48 | 16 | 168:00:00 |
s_debug | sd | 48 | 24 | -- | 04:00:00 |
l_short | ls | 144 | 48 | -- | 08:00:00 |
l_long | ll | 3000 | 360 | -- | 168:00:00 |
express | ex | 96 | 48 | -- | 01:00:00 |
The partitions are generally divided into two broad categories, with the "small" partitions having a "s_" prefix in their names, and the "large" partitions having a "l_" prefix in their names. The implied "small" and "large" here refer to the relative maximum size of jobs allowed to run within each partition -- and by size we mean the number of CPU cores requested by the job. The second portion of every partition name hints at the relative maximum length of walltime for jobs running within that partition. So the "short" partitions will run the relatively shorter jobs and the "long" partitions will run the relatively longer jobs. The s_debug queue is meant to be used primarily for testing the viability and correctness of jobs files, and for running short test calculations before submitting workloads to the s_* production partitions. For jobs destined for the l_* queues, the l_short queue may be used for this purpose, in addition to running production jobs.
The small partitions are meant for users running small jobs. We have defined small as "requiring anywhere from 1 to 8 cores" in the s_short partition or "requiring anywhere from 1 to 16 cores" in the s_long partition. In all cases, jobs submitted to the small partitions must fit *within* a single node, and can never span multiple nodes. Furthermore, all small jobs run on a set of nodes that are "sharable"; in other words, each of these nodes is able to run multiple jobs from multiple users simultaneously -- just like on raad.
The large partitions, on the other hand, only allocate whole nodes to users, and not individual cores. No two users running in one of the large partitions should be landing on the same node simultaneously; these nodes are for exclusive use by any given user and are NOT "sharable". However, a single user may submit multiple jobs to a single node if they so desire. Note that the set of nodes that service the large partitions is distinct from the set that services the small partitions, and there is no overlap between the two sets.
How to view available partitions
In SLURM, queues are known as "partitions". You can issue the 'sinfo' command to list available partitions and to query their current state. Refer to sinfo Help Page for more details.
mustarif63@raad2b:~> sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
s_short up 8:00:00 0/14/0/14 nid00[058-063,200-207]
s_long up 7-00:00:00 0/14/0/14 nid00[058-063,200-207]
s_debug up 4:00:00 0/2/0/2 nid00[208-209]
l_short up 8:00:00 28/14/6/48 nid000[09-11,13-57]
l_long up 7-00:00:00 28/14/6/48 nid000[09-11,13-57]
Create a job file
A sample job file in SLURM will look like this. This is simply a text file with a particular syntax.
#!/bin/sh
#SBATCH -J DemoJob
#SBATCH -p l_long
#SBATCH --qos=ll
#SBATCH --time=00:30:00
#SBATCH --ntasks=24
#SBATCH --output=DemoJob.o%j
#SBATCH --error=DemoJob.e%j
srun --ntasks=24 ./a.out
Submitting job file
mustarif63@raad2b:~> sbatch MySlurm.job
Submitted batch job 973
More information on sbatch can be found at sbatch Help Page
List running jobs
squeue
More information on squeue can be found at squeue Help Page
List running job info
scontrol show job <job_id>
Delete running job
scancel <job_id>
Running collections of similar jobs <Job Array>
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily. One of its powerful features is the array job functionality, which allows users to submit multiple jobs with similar attributes efficiently. All jobs must have the same initial options (e.g. size, time limit, etc.). SLURM's array job support simplifies the management of tasks that are repetitive or require slight variations in parameters.
#SBATCH --array=<range>|<list>
Where <range> can be a range expression like 0-10, and <list> can be a comma-separated list like 1,3,5,7.
Job arrays are only supported for batch jobs and the array index values are specified using the --array or -a option of the SBATCH directive. The option argument can be specific array index values, a range of index values, and an optional step size as shown in the examples below.
# Submit a job array with index values between 0 and 20
#SBATCH --array=0-20
# Submit a job array with index values of 1, 3, 5 and 7
#SBATCH --array=1,3,5,7
# Submit a job array with index values between 1 and 7
# with a step size of 2 (i.e. 1, 3, 5 and 7)
#SBATCH --array=1-7:2
# Submit a job array with custom task names:
#SBATCH --array=mytask1,mytask2,mytask3
# A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator.
# For example to limit to run 4 jobs array simultaneously out of 15.
#SBATCH --array=0-15%4
The following is an example of a job file used to execute a Python script that outputs the index value for arrays containing 1-15 jobs:
#!/bin/bash
#SBATCH -J jobArray
#SBATCH -p l_long
#SBATCH --qos ll
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=jobArray_%A_%a.out
#SBATCH --error=jobArray_%A_%a.err
#SBATCH --array=1-15
# Get the current job index from SLURM array task ID
JOB_INDEX=${SLURM_ARRAY_TASK_ID}
srun python sample.py $JOB_INDEX
To submit and view the results of above array job use following commands
irtambo19@raad2a:~/scripts_test> sbatch run-array1.slurm
Submitted batch job 10672830
irtambo19@raad2a:~/scripts_test>
irtambo19@raad2a:~/scripts_test> squeue -u irtambo19
JOBID USER PARTITION QOS NAME ST REASON TIME_LEFT NODES CPUS THREADS_PER_CORE NODELIST
10672830_1 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00029
10672830_2 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00029
10672830_3 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00029
10672830_4 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00029
10672830_5 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00034
10672830_6 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00034
10672830_7 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00034
10672830_8 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00034
10672830_9 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00037
10672830_10 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00037
10672830_11 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00037
10672830_12 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00037
10672830_13 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00038
10672830_14 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00038
10672830_15 irtambo19 l_long ll jobArray R None 0:54 1 2 * nid00038