Submitting Jobs
Interactive Jobs
Submitting an interactive job to GPU cluster establishes direct terminal access to a GPU node, where you can test/develop your code before actually submitting in batch.
To launch an interactive job, you can issue "sinteractive" command
abtakid91@raad2-gfx:~$ sinteractive
You will notice that in the terminal, raad2-gfx has changed to gfx1. This means that you are now on a GPU node.
Interactive Python Job
Load python
abtakid91@gfx1:~$ module load python36
Let us activate the sample dlproject virtual environment we created and start testing:
abtakid91@gfx1:~$ conda activate dlproject
(dlproject) abtakid91@gfx1:~$ python
After making sure that everything in the code is working fine, deactivate your virtual environment, exit to the login node, and you are ready to make a Batch submission.
To deactivate your virtual environment:
(dlproject) abtakid91@gfx1:~$ conda deactivate
To exit to the login node:
abtakid91@gfx1:~$ exit
Interactive CUDA Job
A sample CUDA code is placed at "/lustre/share/examples/gpu/". A very nice tutorial on this can be found C++/CUDA here.
1. Copy sample cuda code in your home directory
muarif092@raad2-gfx:~$ cp /ddn/share/examples/gpu-tutorial/01_cuda/ .
2. Submit an interactive job to compile cuda code
muarif092@raad2-gfx:~$ sinteractive
3. Load CUDA modules
muarif092@gfx1:~$ module load cuda
4. Compile sample code using Nvidia Cuda Compiler (nvcc)
muarif092@gfx1:~$ which nvcc
muarif092@gfx1:~$ nvcc -o add_cuda
5. Run the executable
muarif092@gfx1:~$ ./add_cuda
Max error: 0
6. Profile your code
muarif092@gfx1:~$ nvprof ./add_cuda
==369808== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 4.9192ms 1 4.9192ms 4.9192ms 4.9192ms add(int, float*, float*)
==369808== Unified Memory profiling result:
Device "Tesla V100-PCIE-16GB (0)"
Count Avg Size Min Size Max Size Total Size Total Time Name
48 170.67KB 4.0000KB 0.9961MB 8.000000MB 796.7360us Host To Device
7. Exit from Interactive Job
abtakid91@gfx1:~$ exit
Batch Jobs
Sample Slurm Job file for Python
To run the sample job, you have to:
- Copy the sample job file to your working directory.
abtakid91@raad2-gfx:~$ cp /lustre/share/examples/gpu/gpu.job .
- In the job file, line 11, change <env_name> to the name of your virtual envirnment. (e.g. dlproject)
- In the job file, line 15, change to the name of your python file. (e.g.
Your sample python job will then look like this:
#SBATCH -J batch
#SBATCH --time=24:00:00
#SBATCH --ntasks=18
#SBATCH --gres=gpu:v100:1
module load cuda90/toolkit
source /cm/shared/apps/anaconda3/etc/profile.d/
conda activate dlproject
srun --ntasks=1 python
Then you will be able to run it:
abtakid91@raad2-gfx:~$ sbatch gpu.job
Sample Slurm Job file for Cuda
Below is a sample Slurm job file for Cuda program. The source file "" can be found here; "/lustre/share/examples/gpu/"
#SBATCH -J batch
#SBATCH --time=24:00:00
#SBATCH --ntasks=18
#SBATCH --gres=gpu:v100:1
module load cuda90/toolkit
srun --ntasks=1 nvcc -o add_cuda
srun --ntasks=1 ./add_cuda
Now submit batch job
<source lang="cpp">
abtakid91@raad2-gfx:~$ sbatch gpu.job
The output of this job will be placed in the same directory.