Research project ANR-23-CE23-0021-01
04/01/24 - 09/30/28
In deep learning, many recent models of Self-Supervised Learning (SSL) use instance based discriminative tasks. The underlying idea is to define positive (resp. negative) pairs of data whose representations have to be similar (resp. dissimilar). To define these positive pairs, the models consider a set of augmentations (e.g. resize, color jitter, blur, etc.) to be applied to the same inputs. The models learn to be invariant (or equivariant where considering the impact of the augmentation on the representation) to the properties modified by these augmentations, but not to the semantic content of the input.
However, there is “a fundamental misalignment between human and typical AI representations: while the former are grounded in rich sensorimotor experience, the latter are typically passive and limited to a few modalities such as vision and text” [1]. In this project, we propose to take inspiration from the way babies learn to explore their environment through actions that shape their multimodal experience to improve. We will build more specifically upon sensorimotor contingencies theory [2], which combines coherent pieces of evidence from neuroscience and psychology in a unified framework. The key claims are about:
Our claim in this project is that to go beyond invariant and equivariant pretext tasks, the next research axis to explore in SSL is to consider action as the core of the representation learning and perception process, guided by the concepts of sensorimotor contingencies theory. Thus, this conceptual shift of using action as an unifying key point in learning will guide towards more general architectures and representations. It will also allow to interact with the environment to have access to all its dynamic and properties. Moreover, having more human like perceptual and learning mechanisms should help to generalize to various environments, as human do.
___
[1] N. Hay, M. Stark, et al. Behavior is everything: Towards representing concepts with sensorimotor contingencies. AAAI, 2018
[2]J.K. O'Regan and A. Noë. A sensorimotor account of vision and visual consciousness. Behavioural and brain sciences, 2001
[3]E. Myin and J.K. O'Regan. Perceptual consciousness, access to modality and skill theories. A way to naturalize phenomenology? Journal of consciousness studies, 2002
The project is structured in 4 WPs: