Raad2: CERN Open Data Client

From TAMUQ Research Computing User Documentation Wiki
Jump to navigation Jump to search


Introduction to CERN Open Data Client

  • cernopendata-client is a command-line tool to facilitate downloading files from the CERN Open Data portal.
  • The tool enables to query datasets hosted on the CERN Open Data portal and to download and verify the individual data set files.

Installation

The steps describe in this section need to be performed only once. Once you have installed the tool, you just need to perform the steps from the Usage section.

Installation via pip

  • Load a recent version of Python :
albelfe17@raad2a:~> module load python/3113
  • Create and enter your working directory, i.e the folder where you want to store the dataset locally:
albelfe17@raad2a:~/swinstall> mkdir folderToDownloadCernData && cd folderToDownloadCernData
albelfe17@raad2a:~/swinstall/folderToDownloadCernData>
  • To install the cernopendata_client utility, we will use a so-called Python virtual environment
  • Create a Python virtual environment:
albelfe17@raad2a:~/swinstall/folderToDownloadCernData> virtualenv .
  • Activate your Python virtual environment:
albelfe17@raad2a:~/swinstall/folderToDownloadCernData> source ./bin/activate
(folderToDownloadCernData) albelfe17@raad2a:~/swinstall/folderToDownloadCernData>
  • Now your virtual environment is up and running (notice the (folderToDownloadCernData) in front of your prompt)
  • Hence you can use "pip" to download any python package you want, in our case : cernopendata-client :
(folderToDownloadCernData) albelfe17@raad2a:~/swinstall/folderToDownloadCernData> pip install cernopendata-client
Collecting cernopendata-client
  Using cached cernopendata_client-0.3.0-py2.py3-none-any.whl
[...]
  Downloading certifi-2023.11.17-py3-none-any.whl.metadata (2.2 kB)
Downloading click-8.1.7-py3-none-any.whl (97 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 398.9 kB/s eta 0:00:00
[...]
Installing collected packages: urllib3, idna, click, charset-normalizer, certifi, requests, cernopendata-client
Successfully installed cernopendata-client-0.3.0 certifi-2023.11.17 charset-normalizer-3.3.2 click-8.1.7 idna-3.6 requests-2.31.0 urllib3-2.1.0
  • The tool is now installed successfully on your virtual environment
  • You can close the virtual environment, simply by typing : deactivate
(folderToDownloadCernData) albelfe17@raad2a:~/swinstall/folderToDownloadCernData> deactivate
albelfe17@raad2a:~/swinstall/folderToDownloadCernData>
  • Next time you want to use the tool, simply follow the steps described in the "Usage" section below

Usage

Lauch the tool

After installing the tool once, you can launch it simply by executing the few steps below:

  • Open your Raad2 account via MobaXterm
  • Load a recent version of Python :
albelfe17@raad2a:~> module load python/3113
  • Go to your working directory
albelfe17@raad2a:~> cd swinstall/folderToDownloadCernData/
albelfe17@raad2a:~/swinstall/folderToDownloadCernData>
  • Activate your virtual environment
albelfe17@raad2a:~/swinstall/folderToDownloadCernData> source ./bin/activate
(folderToDownloadCernData) albelfe17@raad2a:~/swinstall/folderToDownloadCernData>

==> You can now start playing around with the tool

Use the tool

  • This cli provides several useful commands :
(folderToDownloadCernData) albelfe17@raad2a:~/swinstall/folderToDownloadCernData> cernopendata-client --help
[...]
Commands:
  download-files      Download data files belonging to a record.
  get-file-locations  Get a list of data file locations of a record.
  get-metadata        Get metadata content of a record.
  list-directory      List contents of a EOSPUBLIC Open Data directory.
  verify-files        Verify downloaded data file integrity.
  version             Return cernopendata-client version.
  • For instance, if you want to download all files from record 24643, simply type :
(folderToDownloadCernData) albelfe17@raad2a:~/swinstall/folderToDownloadCernData> cernopendata-client download-files --recid 24643
==> Downloading file 1 of 1316
  -> File: ./24643/F00776F3-3075-E211-AD4E-0025901D5E10.root
  -> Progress: 982/982 KiB (100%)
[...]
==> Downloading file 5 of 1316
[...]
==> Downloading file 6 of 1316
[...]
  • If you don't know the record ID (or recid) of your data set, go to : https://opendata.cern.ch/
  • In the Search bar, type the keywords associated with your dataset
  • Then you can filter by type, by experiment and so on
  • Once you find the proper data set, checkout the digits after the last slash of the https address, this corresponds to the record ID

External Resources

Full manual descrbing the tool in detail : https://cernopendata-client.readthedocs.io/en/latest/index.html