Submitting Jobs
Interactive Jobs
Submitting an interactive job to GPU cluster establishes direct terminal access to a GPU node, where you can test/develop your code before actually submitting in batch.
To launch an interactive job, you can issue "sinteractive" command
abtakid91@raad2-gfx:~$ sinteractive
abtakid91@gfx1:~$
You will notice that in the terminal, raad2-gfx has changed to gfx1. This means that you are now on a GPU node.
Interactive Python Job
Load python
abtakid91@gfx1:~$ module load python36
Let us activate the sample dlproject virtual environment we created and start testing:
abtakid91@gfx1:~$ conda activate dlproject
(dlproject) abtakid91@gfx1:~$ python dl.py
After making sure that everything in the code is working fine, deactivate your virtual environment, exit to the login node, and you are ready to make a Batch submission.
To deactivate your virtual environment:
(dlproject) abtakid91@gfx1:~$ conda deactivate
To exit to the login node:
abtakid91@gfx1:~$ exit
Interactive CUDA Job
A sample CUDA code is placed at "/lustre/share/examples/gpu/add.cu". A very nice tutorial on this can be found C++/CUDA here.
1. Copy sample cuda code in your home directory
muarif092@raad2-gfx:~$ cp /ddn/share/examples/gpu-tutorial/01_cuda/add.cu .
2. Submit an interactive job to compile cuda code
muarif092@raad2-gfx:~$ sinteractive
muarif092@gfx1:~$
3. Load CUDA modules
muarif092@gfx1:~$ module load cuda
4. Compile sample code using Nvidia Cuda Compiler (nvcc)
muarif092@gfx1:~$ which nvcc
/cm/shared/apps/cuda90/toolkit/9.0.176/bin/nvcc
muarif092@gfx1:~$ nvcc cuda.cu -o add_cuda
5. Run the executable
muarif092@gfx1:~$ ./add_cuda
Max error: 0
6. Profile your code
muarif092@gfx1:~$ nvprof ./add_cuda
==369808== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 4.9192ms 1 4.9192ms 4.9192ms 4.9192ms add(int, float*, float*)
...
...
==369808== Unified Memory profiling result:
Device "Tesla V100-PCIE-16GB (0)"
Count Avg Size Min Size Max Size Total Size Total Time Name
48 170.67KB 4.0000KB 0.9961MB 8.000000MB 796.7360us Host To Device
....
...
7. Exit from Interactive Job
abtakid91@gfx1:~$ exit
Batch Jobs
Sample Slurm Job file for Python
To run the sample job, you have to:
- Copy the sample job file to your working directory.
abtakid91@raad2-gfx:~$ cp /lustre/share/examples/gpu/gpu.job .
- In the job file, line 11, change <env_name> to the name of your virtual envirnment. (e.g. dlproject)
- In the job file, line 15, change myapp.py to the name of your python file. (e.g. dl.py)
Your sample python job will then look like this:
#!/bin/bash
#SBATCH -J batch
#SBATCH --time=24:00:00
#SBATCH --ntasks=18
#SBATCH --gres=gpu:v100:1
module load cuda90/toolkit
source /cm/shared/apps/anaconda3/etc/profile.d/conda.sh
conda activate dlproject
export OMP_NUM_THREADS=18
srun --ntasks=1 python dl.py
Then you will be able to run it:
abtakid91@raad2-gfx:~$ sbatch gpu.job
Sample Slurm Job file for Cuda
Below is a sample Slurm job file for Cuda program. The source file "add.cu" can be found here; "/lustre/share/examples/gpu/add.cu"
#!/bin/bash
#SBATCH -J batch
#SBATCH --time=24:00:00
#SBATCH --ntasks=18
#SBATCH --gres=gpu:v100:1
module load cuda90/toolkit
export OMP_NUM_THREADS=18
srun --ntasks=1 nvcc add.cu -o add_cuda
srun --ntasks=1 ./add_cuda
</source>
Now submit batch job
<source lang="cpp">
abtakid91@raad2-gfx:~$ sbatch gpu.job
The output of this job will be placed in the same directory.