Import scripts
Table of Contents
Publications and members can be automatically imported from the LIRIS servers. The basis for these imports are endpoints on the LIRIS website to get the data.
Import scripts #
Example import scripts are provided on the LIRIS gitlab. These scripts are basically meant for Hugo, but can probably be used with other static sites generators like Jekyll. They produce markdown files with data in the front matter. See members or publications documentation to get more details on the format.
The repository contains two scripts. The import_members.py creates a members
directory with fetched members for a team. The import_publications.py creates
a publications directory with fetched publications for a team or single LIRIS
member. The added pyproject.toml is a file to track the python dependencies
and easily install them in a continuous integration pipeline or locally in a
virtual environment to avoid messing with your local python installation. To
easily update the scripts when these are developed, you can add the repository
as a submodule at the root of your website directory :
git submodule add https://gitlab.liris.cnrs.fr/cell-si/hugo-import-scripts integration
This creates an intergation directory with the scripts in the repository. We
describe later how to test them locally, but running the
scripts can be automated on each push on your website repository,
and running the scripts can be done by gitlab runners.
If for some reason you want to locate these scripts elsewhere in your
repository, and use the default configuration provided with this example site,
the integration script paths should be updated in config/_default/module.toml.
Configuration #
By default, if the scripts are located in a directory at the root of your, they
will search for a configuration file at config/integration.yml to configure
the data sources :
members:
team: liris-team
publications:
author: liris-team-or-login
type: equipe # For a team
#type: membre # For a single member
For members, edit liris-team to the name of the team you are importing content
for. For publications, replace liris-team-team-or-login to the team name, or
the LIRIS login of the user you want to fetch the publications for. The
type should be set to specify whether publications are fetched for a team
or a member.
If you prefer using another path for your configuration file, you can either run
the scripts with a --config option providing the configuration file path, or
edit the scripts and change the default configuration path.
The default configuration also supposes that your import scripts are located in
an integration directory at the root of your repository, that the imported
publications should be placed in the content/publications directory and the
imported members in the content/members directory. If this is not the case,
you should edit the corresponding mounts in config/_default/module.toml.
Continuous integration task #
The import can be triggered with an additional task in your
continuous
integration pipeline. This is done by editing the .gitlab-ci.yml
at the root of your website repository. From
the original
configuration, first add a stage before
building the site :
stages:
- fetch
- build
- deploy
Next add a task for this stage, meant to run the scripts, and produce member and publication artifacts.
fetch:
stage: fetch
image: python:3-alpine3.22
tags:
- docker
variables:
GIT_SUBMODULE_STRATEGY: recursive
before_script:
- pip install poetry
script:
- cd integration
- poetry install
- poetry run python import_members.py
- poetry run python import_publications.py
artifacts:
paths:
- integration/publications/
- integration/members/
In the above task, if you decided to modify the path to the configuration file
or import scripts, you should adapt the script section to navigate to the
correct directory and the artifacts/paths to preserve the paths to the
members and publications directories produced by the scripts.
Local testing #
To locally test the script execution, you should first ensure that you have a
valid python installation, and that the necessary dependencies are available.
These are provided in the pyproject.toml file. A simple solution to install
all the dependencies in a virtual environment without interfering with your
local python installation is to use poetry, as done in the Continuous
integration task. From the script directory, you
can run
poetry install
Please refer to poetry documentation for
details on using poetry. Once the dependencies are available, assuming you
have a configuration file available, you can run the scripts
with
poetry run python import_members.py
poetry run python import_publications.py
If you skipped using poetry to use your local python environment, just remove
the poetry run prefix in the commands above.
If the script runs without problems, each will create a folder filled with markdown files, one per member or publication in the corresponding folder.
Data source details #
Members #
For member imports, the script uses the url
https://liris.cnrs.fr/en/webmaster/export/membres-equipe/{team}
where {team} is to be replaced by the name of the desired LIRIS team. This url
provides the data as a csv table in English (for member status in particular).
The data can also be obtained in French with the url
https://liris.cnrs.fr/webmaster/export/membres-equipe/{team}
In the resulting csv, the first column contains the id of the users in the
LIRIS system. Using this id, the script then scraps the web page at
https://liris.cnrs.fr/user/{id}
From this web page html, the script extracts the picture of the member, and the url for their personal web page. If no picture is found, an avatar is automatically generated using an identicon to avoid gender or ethnicity bias. The generated files are named after the user LIRIS id.
Publications #
For publications, the script uses the url
https://liris.cnrs.fr/publications/?type={type}&nom={author}&output=json
where the {type} field can be either equipe for the publications of a team or
membre for the publications of a single member. The {author}, depending on
the selected type, can be either the name of the team or a LIRIS member login.
The result is obtained as a json file listing the publications. Most of the
fields are extracted directly, and for additional publication information, the
script uses the ids field corresponding to the publication identifier on
HAL. With this identifier, the HAL
API is queried at
https://api.archives-ouvertes.fr/search/
We currently query for the existence of a document associated to the publication
to propose a download link, and for the thumbnail id in case you are willing to
customize the publication layout on the site and use these thumbnails. The code
for this is currently commented in layouts/_partials/publication.html.