Running Jupyter Notebooks on the GPU Cluster
Overview
As with other types of computation, by policy users are not allowed to run their jupyter notebooks on the login node (raad2-gfx), but must do so on one of the GPU nodes (gfx1 through gfx4). A jupyter notebook is essentially a web server application that allows the user to interact with it via a standard web browser. Because this application will be running on a GPU node residing on a private network internal to the HPC cluster, certain networking tricks (i.e., "port forwarding") need to be employed to allow a user outside the HPC system to reach the application on the inside. We have tried to make this process a bit friendlier than it normally is by automating it partially.
Before we can get to the point of launching the jupyter notebook though, we first need to install it in our home directory...
Make the Conda Tool Accessible
We can use the conda tool to create a custom environment within our home directory where we will install our own instance of a jupyter notebook. However, before the conda tool can be run, we must source the appropriate file to make it accessible to our bash shell.
source /ddn/sw/xc40/cle7/anaconda/2023.03-1/anaconda3/etc/profile.d/conda.sh
Alternatively, the line above may be appended to your ~/.bashrc file so you do not have to run this command manually every time you need to use the conda tool. This can be accomplished with:
echo "source /ddn/sw/xc40/cle7/anaconda/2023.03-1/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
After adding this line to the .bashrc file for the first (and only) time, you should log out from your account and then log back in again so that the task of sourcing the conda.sh file is actually performed for your current bash shell.
Install the Jupyter Package
conda create -n myJupyter python=3.11 jupyter
When creating the environment with the command above, we assign it a name with the -n
option ("myJupyter_3.8") and we specify a particular version of python we wish to use as a base for this environment with the python=3.8
option. Furthermore, we specify the installation of the jupyter
conda package within this base environment.
Launch the Jupyter Notebook on a Raad2 compute Node
The following script may be downloaded to your PC, and saved as jupyter-gfxlauncher.sh
.
#!/bin/bash
# Jupyter Launcher version 1.0
# Author: faisal.chaudhry@qatar.tamu.edu
# Group: Research Computing @ TAMUQ
if [ $# -eq 0 ]
then
printf "\nPlease provide your raad2-gfx username.\n Usage: jupyter-launcher.sh <username> \n Example: jupyter-launcher.sh fachaud74\n\n"
read -p 'Username: ' uservar
usr=$uservar
else
usr=$1
fi
printf "Connecting to raad2-login1 to get port number for Jupyter Lab \n"
port=`ssh -t -Y $usr@raad2-gfx "echo $((50000 + RANDOM % $UID))"`
port=`echo $port | tr -d '\r'`
printf "Port number fetched: $port\n\n"
printf "In order to launch the Jupyter notebook, you must manually:\n\n"
printf "1) Activate your specific conda environment (in which you installed your instance of jupyter notebook)\n"
printf " with a command of the form \"conda activate <your_env_name>\". For example:\n"
printf " conda activate myJupyter\n\n"
printf "2) Start the jupyter notebook server with this specific command:\n"
printf " jupyter-notebook --no-browser --port=$port --ip='0.0.0.0' \n\n"
printf "3) Start a web browser on your local PC and point it to the URL suggested by the output of the previous command.\n"
printf " It will look something like this:\n"
printf " http://127.0.0.1:$port/?token=1e493a3016c55337ac1278a7ab320005749e86ea66d1a19d \n\n"
printf "4) Notebook shutdown instructions:\n"
printf " a) click "Quit" on the notebook web page to stop the notebook server\n"
printf " b) type \"conda deactivate\" & hit enter at the command prompt within the compute node terminal\n"
printf " c) type \"exit\" & hit enter to end the interactive job on the compute node\n"
printf "\n"
printf "We will now launch an interactive job (4 hrs time limit) on one of the Raad2 compute nodes...\n"
printf "(follow steps 1 to 3 outlined above, once a command prompt becomes available)\n\n"
ssh -t -Y -L $port:localhost:$port $usr@raad2-gfx "srun --pty --tunnel=$port:$port --time=04:00:00 --job-name=Jupyter --ntasks=18 --gres=gpu:v100:1 /bin/bash"
In order to run this script from a local terminal within your MobaXterm program, do as follows:
17/10/2022 13:32.59 /home/mobaxterm ./jupyter-launcher.sh fachaud74
Connecting to raad2-login1 to get port number for Jupyter Lab
Connection to raad2-login1 closed.
Port number fetched: 55751
In order to launch the Jupyter notebook, you must manually:
1) Activate your specific conda environment (in which you installed your instance of jupyter notebook)
with a command of the form "conda activate <your_env_name>". For example:
conda activate myJupyter
2) Start the jupyter notebook server with this specific command:
jupyter-notebook --no-browser --port=55751 --ip='0.0.0.0'
3) Start a web browser on your local PC and point it to the URL suggested by the output of the previous command.
It will look something like this:
http://127.0.0.1:55751/?token=1e493a3016c55337ac1278a7ab320005749e86ea66d1a19d
4) Notebook shutdown instructions:
a) click Quit on the notebook web page to stop the notebook server
b) type "conda deactivate" & hit enter at the command prompt within the compute node terminal
c) type "exit" & hit enter to end the interactive job on the compute node
We will now launch an interactive job (4 hrs time limit) on one of the Raad2 GPU nodes...
(follow steps 1 to 3 outlined above, once a command prompt becomes available)
[fachaud74@nid00053 ~]$
If the following output happens to be present among the output seen above:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:Vhit7kLwb9tE1sexEJ/mW030U1FqDP9Nj/RG8fZrp98.
Please contact your system administrator.
Add correct host key in /ddn/home/fachaud74/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /ddn/home/fachaud74/.ssh/known_hosts:1
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Port forwarding is disabled to avoid man-in-the-middle attacks.
...then edit your known_hosts file and remove line 1 from it.