Import scripts

Table of Contents

Publications and members can be automatically imported from the LIRIS servers. The basis for these imports are endpoints on the LIRIS website to get the data.

Import scripts #

Example import scripts are provided on the LIRIS gitlab. These scripts are basically meant for Hugo, but can probably be used with other static sites generators like Jekyll. They produce markdown files with data in the front matter. See members or publications documentation to get more details on the format.

The repository contains two scripts. The import_members.py creates a members directory with fetched members for a team. The import_publications.py creates a publications directory with fetched publications for a team or single LIRIS member. The added pyproject.toml is a file to track the python dependencies and easily install them in a continuous integration pipeline or locally in a virtual environment to avoid messing with your local python installation. To easily update the scripts when these are developed, you can add the repository as a submodule at the root of your website directory :

git submodule add https://gitlab.liris.cnrs.fr/cell-si/hugo-import-scripts integration

This creates an intergation directory with the scripts in the repository. We describe later how to test them locally, but running the scripts can be automated on each push on your website repository, and running the scripts can be done by gitlab runners.

If for some reason you want to locate these scripts elsewhere in your repository, and use the default configuration provided with this example site, the integration script paths should be updated in config/_default/module.toml.

Configuration #

By default, if the scripts are located in a directory at the root of your, they will search for a configuration file at config/integration.yml to configure the data sources :

members:
  team: liris-team

publications:
  author: liris-team-or-login
  type: equipe # For a team
  #type: membre # For a single member

For members, edit liris-team to the name of the team you are importing content for. For publications, replace liris-team-team-or-login to the team name, or the LIRIS login of the user you want to fetch the publications for. The type should be set to specify whether publications are fetched for a team or a member.

If you prefer using another path for your configuration file, you can either run the scripts with a --config option providing the configuration file path, or edit the scripts and change the default configuration path.

The default configuration also supposes that your import scripts are located in an integration directory at the root of your repository, that the imported publications should be placed in the content/publications directory and the imported members in the content/members directory. If this is not the case, you should edit the corresponding mounts in config/_default/module.toml.

Continuous integration task #

The import can be triggered with an additional task in your continuous integration pipeline. This is done by editing the .gitlab-ci.yml at the root of your website repository. From the original configuration, first add a stage before building the site :

stages:
  - fetch
  - build
  - deploy

Next add a task for this stage, meant to run the scripts, and produce member and publication artifacts.

fetch:
  stage: fetch
  image: python:3-alpine3.22
  tags:
    - docker
  variables:
    GIT_SUBMODULE_STRATEGY: recursive
  before_script:
    - pip install poetry
  script:
    - cd integration
    - poetry install
    - poetry run python import_members.py
    - poetry run python import_publications.py
  artifacts:
    paths:
      - integration/publications/
      - integration/members/

In the above task, if you decided to modify the path to the configuration file or import scripts, you should adapt the script section to navigate to the correct directory and the artifacts/paths to preserve the paths to the members and publications directories produced by the scripts.

Local testing #

To locally test the script execution, you should first ensure that you have a valid python installation, and that the necessary dependencies are available. These are provided in the pyproject.toml file. A simple solution to install all the dependencies in a virtual environment without interfering with your local python installation is to use poetry, as done in the Continuous integration task. From the script directory, you can run

poetry install

Please refer to poetry documentation for details on using poetry. Once the dependencies are available, assuming you have a configuration file available, you can run the scripts with

poetry run python import_members.py
poetry run python import_publications.py

If you skipped using poetry to use your local python environment, just remove the poetry run prefix in the commands above.

If the script runs without problems, each will create a folder filled with markdown files, one per member or publication in the corresponding folder.

Data source details #

Members #

For member imports, the script uses the url

https://liris.cnrs.fr/en/webmaster/export/membres-equipe/{team}

where {team} is to be replaced by the name of the desired LIRIS team. This url provides the data as a csv table in English (for member status in particular). The data can also be obtained in French with the url

https://liris.cnrs.fr/webmaster/export/membres-equipe/{team}

In the resulting csv, the first column contains the id of the users in the LIRIS system. Using this id, the script then scraps the web page at

https://liris.cnrs.fr/user/{id}

From this web page html, the script extracts the picture of the member, and the url for their personal web page. If no picture is found, an avatar is automatically generated using an identicon to avoid gender or ethnicity bias. The generated files are named after the user LIRIS id.

Publications #

For publications, the script uses the url

https://liris.cnrs.fr/publications/?type={type}&nom={author}&output=json

where the {type} field can be either equipe for the publications of a team or membre for the publications of a single member. The {author}, depending on the selected type, can be either the name of the team or a LIRIS member login. The result is obtained as a json file listing the publications. Most of the fields are extracted directly, and for additional publication information, the script uses the ids field corresponding to the publication identifier on HAL. With this identifier, the HAL API is queried at

https://api.archives-ouvertes.fr/search/

We currently query for the existence of a document associated to the publication to propose a download link, and for the thumbnail id in case you are willing to customize the publication layout on the site and use these thumbnails. The code for this is currently commented in layouts/_partials/publication.html.