Abstract: The Flowers project-team studies models of open-ended development and learning. These models are used as tools to help us understand better how children learn, as well as to build machines that learn like children, i.e. developmental artificial intelligence, with applications in educational technologies, assisted scientific discovery, video games, robotics and human-computer interaction.
Context: Great advances have been made recently in artificial intelligence concerning the topic of how autonomous agents can learn to act in uncertain and complex environments, thanks to the development of advanced Deep Reinforcement Learning techniques. These advances have for example led to impressive results with AlphaGo 190 or algorithms that learn to play video games from scratch 156, 135. However, these techniques are still far away from solving the ambitious goal of lifelong autonomous machine learning of repertoires of skills in real-world, large and open environments. They are also very far from the capabilities of human learning and cognition. Indeed, developmental processes allow humans, and especially infants, to continuously acquire novel skills and adapt to their environment over their entire lifetime. They do so autonomously, i.e. through a combination of self-exploration and linguistic/social interaction with their social peers, sampling their own goals while benefiting from the natural language guidance of their peers, and without the need for an “engineer” to open and retune the brain and the environment specifically for each new task (e.g. for providing a task-specific external reward channel). Furthermore, humans are extremely efficient at learning fast (few interactions with their environment) skills that are very high-dimensional both in perception and action, while being embedded in open changing environments with limited resources of time, energy and computation.
Thus, a major scientific challenge in artificial intelligence and cognitive sciences is to understand how humans and machines can efficiently acquire world models, as well as open and cumulative repertoires of skills over an extended time span. Processes of sensorimotor, cognitive and social development are organized along ordered phases of increasing complexity, and result from the complex interaction between the brain/body with its physical and social environment. Making progress towards these fundamental scientific challenges is also crucial for many downstream applications. Indeed, autonomous lifelong learning capabilities similar to those shown by humans are key requirements for developing virtual or physical agents that need to continuously explore and adapt skills for interacting with new or changing tasks, environments, or people. This is crucial for applications like assistive technologies with non-engineer users, such as robots or virtual agents that need to explore and adapt autonomously to new environments, adapt robustly to potential damages of their body, or help humans to learn or discover new knowledge in education settings, and need to communicate through natural language with human users, grounding the meaning of sentences into their sensorimotor representations.
The Developmental AI approach:
Human and biological sciences have identified various families of developmental mechanisms that are key to explain how infants can acquire so robustly a wide diversity of skills 137, 155, in spite of the complexity and high-dimensionality of the body 95 and the open-endedness of its potential interactions with the physical and social environment. To advance the fundamental understanding of these mechanisms of development as well as their transposition in machines, the FLOWERS team has been developing an approach called Developmental artificial intelligence, leveraging and integrating ideas and techniques from developmental robotics (207, 147, 102, 167, Deep (Reinforcement) Learning and developmental psychology. This approach consists in developing computational models that leverage advanced machine learning techniques such as intrinsically motivated Deep Reinforcement Learning, in strong collaboration with developmental psychology and neuroscience. In particular, the team focuses on models of intrinsically motivated learning and exploration (also called curiosity-driven learning), with mechanisms enabling agents to learn to represent and generate their own goals, self-organizing a learning curriculum for efficient learning of world models and skill repertoire under limited resources of time, energy and compute. The team also studies how autonomous learning mechanisms can enable humans and machines to acquire and develop grounded and culturally shared language skills, using neuro-symbolic architectures for learning structured representations and handling systematic compositionality and generalization.
Our fundamental research is organized along three strands:
Research in artificial intelligence, machine learning and pattern recognition has produced a tremendous amount of results and concepts in the last decades. A blooming number of learning paradigms - supervised, unsupervised, reinforcement, active, associative, symbolic, connectionist, situated, hybrid, distributed learning... - nourished the elaboration of highly sophisticated algorithms for tasks such as visual object recognition, speech recognition, robot walking, grasping or navigation, the prediction of stock prices, the evaluation of risk for insurances, adaptive data routing on the internet, etc... Yet, we are still very far from being able to build machines capable of adapting to the physical and social environment with the flexibility, robustness, and versatility of a one-year-old human child.
Indeed, one striking characteristic of human children is the nearly open-ended diversity of the skills they learn. They not only can improve existing skills, but also continuously learn new ones. If evolution certainly provided them with specific pre-wiring for certain activities such as feeding or visual object tracking, evidence shows that there are also numerous skills that they learn smoothly but could not be “anticipated” by biological evolution, for example learning to drive a tricycle, using an electronic piano toy or using a video game joystick. On the contrary, existing learning machines, and robots in particular, are typically only able to learn a single pre-specified task or a single kind of skill. Once this task is learnt, for example walking with two legs, learning is over. If one wants the robot to learn a second task, for example grasping objects in its visual field, then an engineer needs to re-program manually its learning structures: traditional approaches to task-specific machine/robot learning typically include engineer choices of the relevant sensorimotor channels, specific design of the reward function, choices about when learning begins and ends, and what learning algorithms and associated parameters shall be optimized.
As can be seen, this requires a lot of important choices from the engineer, and one could hardly use the term “autonomous” learning. On the contrary, human children do not learn following anything looking like that process, at least during their very first years. Babies develop and explore the world by themselves, focusing their interest on various activities driven both by internal motives and social guidance from adults who only have a folk understanding of their brains. Adults provide learning opportunities and scaffolding, but eventually young babies always decide for themselves what activity to practice or not. Specific tasks are rarely imposed to them. Yet, they steadily discover and learn how to use their body as well as its relationships with the physical and social environment. Also, the spectrum of skills that they learn continuously expands in an organized manner: they undergo a developmental trajectory in which simple skills are learnt first, and skills of progressively increasing complexity are subsequently learnt.
A link can be made to educational systems where research in several domains have tried to study how to provide a good learning or training experience to learners. This includes the experiences that allow better learning, and in which sequence they must be experienced. This problem is complementary to that of the learner who tries to progress efficiently, and the teacher here has to use as efficiently the limited time and motivational resources of the learner. Several results from psychology 94 and neuroscience 124 have argued that the human brain feels intrinsic pleasure in practicing activities of optimal difficulty or challenge. A teacher must exploit such activities to create positive psychological states of flow 116 for fostering the indivual engagement in learning activities. A such view is also relevant for reeducation issues where inter-individual variability, and thus intervention personalization are challenges of the same magnitude as those for education of children.
A grand challenge is thus to be able to build machines that possess this capability to discover, adapt and develop continuously new know-how and new knowledge in unknown and changing environments, like human children. In 1950, Turing wrote that the child's brain would show us the way to intelligence: “Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child's” 202. Maybe, in opposition to work in the field of Artificial Intelligence who has focused on mechanisms trying to match the capabilities of “intelligent” human adults such as chess playing or natural language dialogue 131, it is time to take the advice of Turing seriously. This is what a new field, called developmental (or epigenetic) robotics, is trying to achieve 147207. The approach of developmental robotics consists in importing and implementing concepts and mechanisms from developmental psychology 155, cognitive linguistics 115, and developmental cognitive neuroscience 136 where there has been a considerable amount of research and theories to understand and explain how children learn and develop. A number of general principles are underlying this research agenda: embodiment 98171, grounding 129, situatedness 193, self-organization 197166, enaction 203, and incremental learning 107.
Among the many issues and challenges of developmental robotics, two of them are of paramount importance: exploration mechanisms and mechanisms for abstracting and making sense of initially unknown sensorimotor channels. Indeed, the typical space of sensorimotor skills that can be encountered and learnt by a developmental robot, as those encountered by human infants, is immensely vast and inhomogeneous. With a sufficiently rich environment and multimodal set of sensors and effectors, the space of possible sensorimotor activities is simply too large to be explored exhaustively in any robot's life time: it is impossible to learn all possible skills and represent all conceivable sensory percepts. Moreover, some skills are very basic to learn, some other very complicated, and many of them require the mastery of others in order to be learnt. For example, learning to manipulate a piano toy requires first to know how to move one's hand to reach the piano and how to touch specific parts of the toy with the fingers. And knowing how to move the hand might require to know how to track it visually.
Exploring such a space of skills randomly is bound to fail or result at best on very inefficient learning 168. Thus, exploration needs to be organized and guided. The approach of epigenetic robotics is to take inspiration from the mechanisms that allow human infants to be progressively guided, i.e. to develop. There are two broad classes of guiding mechanisms which control exploration:
In infant development, one observes a progressive increase of the complexity of activities with an associated progressive increase of capabilities 155, children do not learn everything at one time: for example, they first learn to roll over, then to crawl and sit, and only when these skills are operational, they begin to learn how to stand. The perceptual system also gradually develops, increasing children perceptual capabilities other time while they engage in activities like throwing or manipulating objects. This make it possible to learn to identify objects in more and more complex situations and to learn more and more of their physical characteristics.
Development is therefore progressive and incremental, and this might be a crucial feature explaining the efficiency with which children explore and learn so fast. Taking inspiration from these observations, some roboticists and researchers in machine learning have argued that learning a given task could be made much easier for a robot if it followed a developmental sequence and “started simple” 90121. However, in these experiments, the developmental sequence was crafted by hand: roboticists manually build simpler versions of a complex task and put the robot successively in versions of the task of increasing complexity. And when they wanted the robot to learn a new task, they had to design a novel reward function.
Thus, there is a need for mechanisms that allow the autonomous control and generation of the developmental trajectory. Psychologists have proposed that intrinsic motivations play a crucial role. Intrinsic motivations are mechanisms that push humans to explore activities or situations that have intermediate/optimal levels of novelty, cognitive dissonance, or challenge 94116118. Futher, the exploration of critical role of intrinsic motivation as lever of cognitive developement for all and for all ages is today expanded to several fields of research, closest to its original study, special education or cognitive aging, and farther away, neuropsychological clinical research. The role and structure of intrinsic motivation in humans have been made more precise thanks to recent discoveries in neuroscience showing the implication of dopaminergic circuits and in exploration behaviours and curiosity 117132187. Based on this, a number of researchers have began in the past few years to build computational implementation of intrinsic motivation 16816918292133149183. While initial models were developed for simple simulated worlds, a current challenge is to manage to build intrinsic motivation systems that can efficiently drive exploratory behaviour in high-dimensional unprepared real world robotic sensorimotor spaces 169, 168, 170, 181. Specific and complex problems are posed by real sensorimotor spaces, in particular due to the fact that they are both high-dimensional as well as (usually) deeply inhomogeneous. As an example for the latter issue, some regions of real sensorimotor spaces are often unlearnable due to inherent stochasticity or difficulty, in which case heuristics based on the incentive to explore zones of maximal unpredictability or uncertainty, which are often used in the field of active learning 110130 typically lead to catastrophic results. The issue of high dimensionality does not only concern motor spaces, but also sensory spaces, leading to the problem of correctly identifying, among typically thousands of quantities, those latent variables that have links to behavioral choices. In FLOWERS, we aim at developing intrinsically motivated exploration mechanisms that scale in those spaces, by studying suitable abstraction processes in conjunction with exploration strategies.
Social guidance is as important as intrinsic motivation in the cognitive development of human babies 155. There is a vast literature on learning by demonstration in robots where the actions of humans in the environment are recognized and transferred to robots 89. Most such approaches are completely passive: the human executes actions and the robot learns from the acquired data. Recently, the notion of interactive learning has been introduced in 198, 97, motivated by the various mechanisms that allow humans to socially guide a robot 179. In an interactive context the steps of self-exploration and social guidance are not separated and a robot learns by self exploration and by receiving extra feedback from the social context 198, 140, 150.
Social guidance is also particularly important for learning to segment and categorize the perceptual space. Indeed, parents interact a lot with infants, for example teaching them to recognize and name objects or characteristics of these objects. Their role is particularly important in directing the infant attention towards objects of interest that will make it possible to simplify at first the perceptual space by pointing out a segment of the environment that can be isolated, named and acted upon. These interactions will then be complemented by the children own experiments on the objects chosen according to intrinsic motivation in order to improve the knowledge of the object, its physical properties and the actions that could be performed with it.
In FLOWERS, we are aiming at including intrinsic motivation system in the self-exploration part thus combining efficient self-learning with social guidance 160, 161. We also work on developing perceptual capabilities by gradually segmenting the perceptual space and identifying objects and their characteristics through interaction with the user 148 and robots experiments 134. Another challenge is to allow for more flexible interaction protocols with the user in terms of what type of feedback is provided and how it is provided 146.
Exploration mechanisms are combined with research in the following directions:
FLOWERS develops machine learning algorithms that can allow embodied machines to acquire cumulatively sensorimotor skills. In particular, we develop optimization and reinforcement learning systems which allow robots to discover and learn dictionaries of motor primitives, and then combine them to form higher-level sensorimotor skills.
In order to harness the complexity of perceptual and motor spaces, as well as to pave the way to higher-level cognitive skills, developmental learning requires abstraction mechanisms that can infer structural information out of sets of sensorimotor channels whose semantics is unknown, discovering for example the topology of the body or the sensorimotor contingencies (proprioceptive, visual and acoustic). This process is meant to be open- ended, progressing in continuous operation from initially simple representations towards abstract concepts and categories similar to those used by humans. Our work focuses on the study of various techniques for:
FLOWERS studies how adequate morphologies and materials (i.e. morphological computation), associated to relevant dynamical motor primitives, can importantly simplify the acquisition of apparently very complex skills such as full-body dynamic walking in biped. FLOWERS also studies maturational constraints, which are mechanisms that allow for the progressive and controlled release of new degrees of freedoms in the sensorimotor space of robots.
FLOWERS studies mechanisms that allow a robot to infer structural information out of sets of sensorimotor channels whose semantics is unknown, for example the topology of the body and the sensorimotor contingencies (proprioceptive, visual and acoustic). This process is meant to be open-ended, progressing in continuous operation from initially simple representations to abstract concepts and categories similar to those used by humans.
FLOWERS studies how populations of interacting learning agents can collectively acquire cooperative or competitive strategies in challenging simulated environments. This differs from "Social learning and guidance" presented above: instead of studying how a learning agent can benefit from the interaction with a skilled agent, we rather consider here how social behavior can spontaneously emerge from a population of interacting learning agents. We focus on studying and modeling the emergence of cooperation, communication and cultural innovation based on theories in behavioral ecology and language evolution, using recent advances in multi-agent reinforcement learning.
Over the past decade, the progress in the field of curiosity-driven learning generates a lot of hope, especially with regard to a major challenge, namely the inter-individual variability of developmental trajectories of learning, which is particularly critical during childhood and aging or in conditions of cognitive disorders. With the societal purpose of tackling of social inegalities, FLOWERS deals to move forward this new research avenue by exploring the changes of states of curiosity across lifespan and across neurodevelopemental conditions (neurotypical vs. learning disabilities) while designing new educational or rehabilitative technologies for curiosity-driven learning. The information gaps or learning progress, and their awareness are the core mechanisms of this part of research program due to high value as brain fuel by which the individual's internal intrinsic state of motivation is maintained and leads him/her to pursue his/her cognitive efforts for acquisitions /rehabilitations. Accordingly, a main challenge is to understand these mechanisms in order to draw up supports for the curiosity-driven learning, and then to embed them into (re)educational technologies. To this end, two-ways of investigations are carried out in real-life setting (school, home, work place etc): 1) the design of curiosity-driven interactive systems for learning and their effectiveness study ; and 2) the automated personnalization of learning programs through new algorithms maximizing learning progress in ITS.
https://red-radar.inria.fr/document/FLOWERS-RA-2023/domain
Neuroscience, Developmental Psychology and Cognitive Sciences The computational modelling of life-long learning and development mechanisms achieved in the team centrally targets to contribute to our understanding of the processes of sensorimotor, cognitive and social development in humans. In particular, it provides a methodological basis to analyze the dynamics of the interaction across learning and inference processes, embodiment and the social environment, allowing to formalize precise hypotheses and later on test them in experimental paradigms with animals and humans. A paradigmatic example of this activity is the Neurocuriosity project achieved in collaboration with the cognitive neuroscience lab of Jacqueline Gottlieb, where theoretical models of the mechanisms of information seeking, active learning and spontaneous exploration have been developed in coordination with experimental evidence and investigation, see. Another example is the study of the role of curiosity in learning in the elderly, with a view to assessing its positive value against the cognitive aging as a protective ingredient (i.e, Industrial project with Onepoint and CuriousTECH associate team with M. Fernendes from the Cognitive neursocience Lab of the University of Waterloo).
Personal and lifelong learning assistive agents Many indicators show that the arrival of personal assistive agents in everyday life, ranging from digital assistants to robots, will be a major fact of the 21st century. These agents will range from purely entertainment or educative applications to social companions that many argue will be of crucial help in our society. Yet, to realize this vision, important obstacles need to be overcome: these agents will have to evolve in unpredictable environments and learn new skills in a lifelong manner while interacting with non-engineer humans, which is out of reach of current technology. In this context, the refoundation of intelligent systems that developmental AI is exploring opens potentially novel horizons to solve these problems. In particular, this application domain requires advances in artificial intelligence that go beyond the current state-of-the-art in fields like deep learning. Currently these techniques require tremendous amounts of data in order to function properly, and they are severely limited in terms of incremental and transfer learning. One of our goals is to drastically reduce the amount of data required in order for this very potent field to work when humans are in-the-loop. We try to achieve this by making neural networks aware of their knowledge, i.e. we introduce the concept of uncertainty, and use it as part of intrinsically motivated multitask learning architectures, and combined with techniques of learning by imitation.
Educational technologies that foster curiosity-driven and personalized learning. Optimal teaching and efficient teaching/learning environments can be applied to aid teaching in schools aiming both at increase the achievement levels and the reduce time needed. From a practical perspective, improved models could be saving millions of hours of students' time (and effort) in learning. These models should also predict the achievement levels of students in order to influence teaching practices. The challenges of the school of the 21st century, and in particular to produce conditions for active learning that are personalized to the student's motivations, are challenges shared with other applied fields. Special education for children with special needs, such as learning disabilities, has long recognized the difficulty of personalizing contents and pedagogies due to the great variability between and within medical conditions. More remotely, but not so much, cognitive rehabilitative carers are facing the same challenges where today they propose standardized cognitive training or rehabilitation programs but for which the benefits are modest (some individuals respond to the programs, others respond little or not at all), as they are highly subject to inter- and intra-individual variability. The curiosity-driven technologies for learning and STIs could be a promising avenue to address these issues that are common to (mainstream and specialized) education and cognitive rehabilitation.
Automated discovery in science. Machine learning algorithms integrating intrinsically-motivated goal exploration processes (IMGEPs) with flexible modular representation learning are very promising directions to help human scientists discover novel structures in complex dynamical systems, in fields ranging from biology to physics. The automated discovery project lead by the FLOWERS team aims to boost the efficiency of these algorithms for enabling scientist to better understand the space of dynamics of bio-physical systems, that could include systems related to the design of new materials or new drugs with applications ranging from regenerative medicine to unraveling the chemical origins of life. As an example, Grizou et al. 125 recently showed how IMGEPs can be used to automate chemistry experiments addressing fundamental questions related to the origins of life (how oil droplets may self-organize into protocellular structures), leading to new insights about oil droplet chemistry. Such methods can be applied to a large range of complex systems in order to map the possible self-organized structures. The automated discovery project is intended to be interdisciplinary and to involve potentially non-expert end-users from a variety of domains. In this regard, we are currently collaborating with Poietis (a bio-printing company) and Bert Chan (an independant researcher in artificial life) to deploy our algorithms. To encourage the adoption of our algorithms by a wider community, we are also working on an interactive software which aims to provide tools to easily use the automated exploration algorithms (e.g. curiosity-driven) in various systems.
Human-Robot Collaboration. Robots play a vital role for industry and ensure the efficient and competitive production of a wide range of goods. They replace humans in many tasks which otherwise would be too difficult, too dangerous, or too expensive to perform. However, the new needs and desires of the society call for manufacturing system centered around personalized products and small series productions. Human-robot collaboration could widen the use of robot in this new situations if robots become cheaper, easier to program and safe to interact with. The most relevant systems for such applications would follow an expert worker and works with (some) autonomy, but being always under supervision of the human and acts based on its task models.
Environment perception in intelligent vehicles. When working in simulated traffic environments, elements of FLOWERS research can be applied to the autonomous acquisition of increasingly abstract representations of both traffic objects and traffic scenes. In particular, the object classes of vehicles and pedestrians are if interest when considering detection tasks in safety systems, as well as scene categories (”scene context”) that have a strong impact on the occurrence of these object classes. As already indicated by several investigations in the field, results from present-day simulation technology can be transferred to the real world with little impact on performance. Therefore, applications of FLOWERS research that is suitably verified by real-world benchmarks has direct applicability in safety-system products for intelligent vehicles.
AI is a field of research that currently requires a lot of computational resources, which is a challenge as these resources have an environmental cost. In the team we try to address this challenge in two ways:
Our research activities are organized along two fundamental research axis (models of human learning and algorithms for developmental machine learning) and one application research axis (involving multiple domains of application, see the Application Domains section). This entails different dimensions of potential societal impact:
Meta-cognition in Curiosity- driven educational technologies We developed several projects leveraging fundamental cognitive science studies of curiosity and meta-cognition to design educational interventions that either directly train these skills, or stimulate them to train other related cognitive skills ranging from maths to languages or other transverse skills like attention, and did this for diverse populations ranging from neurotypical to neurodiverse school children, to healthy young adults and aging populations.
At a fundamental level, we studied the beneficial role of curiosity on route memory in children, within a new virtual reality experimental paradigm 79, and in the context of our collaboration with Myra Fernandes at Univ. Waterloo (associated team CuriousTech). To refine the understanding of metacognitive awareness of one’s own learning progress and its role as curiosity-boosters (31, we designed an educational software (4MC project) that aims to train curiosity through the practice of meta-cognitive skills in school children, and pilot studies led to very encouraging results 41. Also, as a follow-up of our systematic review on the interactions between curiosity and cognitive load in immersive learning technologies, we started a field study with 180 university students to test hypotheses about the links between this interaction and learning performances (involving the collaboration with Pr. A. Tricot from University of Montpellier and the CATIE company).
Leveraging the Learning Progress theory of human curiosity we developed in the past 17218, which led us to develop the ZPDES algorithm for personalizing sequences of exercices that foster learning efficiency and motivation 109, we continued studying how ZPDES can be used to personalize training of attention skills in both young adults and aging populations (paper under writing, collab. with D. Bavelier at Univ. of Geneva). Related to this project, we wrote a systematic review of the use of AI in cognitive training technologies 2. We also finalized the analysis of a large-scale experimental study using ZPDES in the context of training maths skills in primary schools, with a focus on the dual impact on learning efficiency and motivation on one hand, and a focus on adding choice possibilities on the other hand, showing positive results of the approach in comparison with hand-made pedagogical sequences. Through a collaboration with the EvidenceB company and support from the French Ministry of Education, the ZPDES personalization system was also deployed in the large-scale AdaptivMaths system now available in all French primary schools (> 68k classrooms). EvidenceB further used ZPDES in the MIA seconde system aimed for training high-school students in maths and french.
Finally, in the Togather project, we also experimented a system aimed at stimulating communication among stakeholders around neurodiverse children in schools (college level), and in particular trying to foster mutual curiosity among them while taking account possible cross-cultural differences in French and Belgium Schools 55. This was associated with a systematic review on methods to collaborate and co-educate students with special educational needs 51.
Rémy Portelas obtained the Best PhD award from University of Bordeaux, category "Special prize of the jury", for his thesis entitled "Automatic Curriculum Learning for Developmental Machine Learners" 176
Erwan Plantec, Gautier Hamon, Mayalen Etcheverry, Bert Chan, Pierre-Yves Oudeyer and Clément Moulin-Frier obtained the Best paper award at Alife 2023 in Tokyo for the paper "Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization" 54.
Source code for the paper https://arxiv.org/abs/2107.00956.
A suite of environments for testing socio-cognitive abilities of artificial agents. Environments can be used in the multimodal setting (suitable for RL agents) and in the pure text setting (suitable for Large Language Model-based agents). Also contains RL and LLM baselines.
This is a web platform for children between 9 and 11 years old, designed to help children practice 4 metacognitive skills that are thought to be involved in curiosity-driven learning: - the ability to identify uncertainties - the ability to generate informed hypotheses - the ability to ask questions - the ability to evaluate the value of a preconceived inference.
Children work on a reading-comprehension tasks and, for each of these skills, the platform offers help through a "conversation" with conversational agents that give instructions to perform the task, with respect to every skill, and can give suggestions if the child asks for it.
This is the code accompannying our paper Plasticity and evolvability under environmental variability: the joint role of fitness-based selection and niche-limited competition" which is to be presented at the Gecco 2022 conference.
In this work we have studied the evolution of a population of agents in a world where the fitness landscape changes with generations based on climate function and a latitudinal model that divides the world in different niches. We have implemented different selection mechanisms (fitness-based selection and niche-limited competition).
The world is divided into niches that correspond to different latitudes and whose state evolves based on a common climate function.
We model the plasticity of an individual using tolerance curves originally developed in ecology. Plasticity curves have the form of a Gaussian the capture the benefits and costs of plasticity when comparing a specialist (left) with a generalist (right) agent.
The repo contains the following main elements :
folder source contains the main functionality for running a simulation scripts/run/reproduce_gecco.py can be used to rerun all simulations in the paper scripts/evaluate contains scripts for reproducing figures. reproduce_figures.py will produce all figures (provided you have already run scripts/run/reproduce_gecco.py to generate the data) folder projects contains data generated from running a simulation How to run To install all package dependencies you can create a conda environment as:
conda env create -f environment.yml
All script executions need to be run from folder source. Once there, you can use simulate.py, the main interface of the codebase to run a simulation, For example:
python simulate.py –project test_stable –env_type stable –num_gens 300 –capacity 1000 –num_niches 10 –trials 10 –selection_type NF –climate_mean_init 2
will run a simulation with an environment with a climate function whose state is constantly 2 consisting of 100 niches for 300 generations and 10 independent trials. The maximum population size will be 1000*2 and selection will be fitness-based (higher fitness means higher chances of reproduction) and niche limited (individuals reproduce independently in each niche and compete only within a niche),
You can also take a look at scripts/run/reproduce_gecco.py to see which flags were used for the simulations presented in the paper.
Running all simulations requires some days. You can instead download the data produced by running scripts/run/reproduce_gecco.py from this google folder and unzip them under the projects directory.
SAPIENS is a reinforcement learning algorithm where multiple off-policy agents solve the same task in parallel and exchange experiences on the go. The group is characterized by its topology, a graph that determines who communicates with whom.
All agents are DQNs and exchange experiences have the form of transitions from their replay buffers.
Using SAPIENS we can define groups of agents that are connected with others based on a a) fully-connected topology b) small-world topology c) ring topology or d) dynamic topology.
Install required packages You can install all required python packages by creating a new conda environment containing the packages in environment.yml:
conda env create -f environment.yml
And then activating the environment:
conda activate sapiens
Example usages Under notebooks there is a Jupyter notebook that will guide you through setting up simulations with a fully-connected and a dynamic social network structure for solving Wordcraft tasks. It also explains how you can access visualizations of the metrics produced during th$
Reproducing the paper results Scripts under the scripts directory are useful for reproducing results and figures appearing in the paper.
With scripts/reproduce_runs.py you can run all simulations presented in the paper from scratch.
This file is useful for looking at how the experiments were configured but better avoid running it: simulations will run locally and sequentially and will take months to complete.
Instead, you can access the data files output by simulations on this online repo.
Download this zip file and uncompress it under the projects directory. This should create a projects/paper_done sub-directory.
You can now reproduce all visualization presented in the paper. Run:
python scripts/reproduce_visuals.py
This will save some general plots under visuals, while project-specific plots are saved under the corresponding project in projects/paper_done
Codebase for the paper Learning to guide and to be guided in the Architect-Builder Problem
ABIG stands for Architect-Builder Iterated Guiding and is an algorithmic solution to the Architect-Builder Problem. The algorithm leverages a learned model of the builder to guide it while the builder uses self-imitation learning to reinforce its guided behavior.
This repo contains the code base of the paper Intrinsically Motivated Goal-Conditioned Reinforcement Learning in Multi-Agent Environments.
In this work, we have studied the importance of the alignment of goals in the training of instrinsically motivated agents in the multi agent goal conditioned RL case. We also proposed a simple algorithm called goal coordination game which allows such agent to learn, in a completely decentralized/selfish way, to communicate in order to align their goal.
The repository contains the code to reproduce the results of the paper. Which includes a custom RL environment ( using SimplePlayground "game engine"), model used (architecture + hyperparameters) and custom training (mostly based on RLlib ) to train both the model and the communication. We also provide the scripts for the training of every condition we test and notebook to study the results.
This repo contains the code to run the Flow Lenia system which is a continuous parametrized cellular automaton with mass conservation. This work extends the classic Lenia system with mass conservation and allows to implement new feature like local parameter, environment components etc
Several declination of the system (1 or several channels etc ) are available
Please refer to the associated paper for the details of the system
Implemented in JAX
Source code for the paper https://arxiv.org/abs/2307.07870
Code enabling systematic evaluation of Large Language Models with various psychology questionnaires in different contexts, e.g. following conversations on different topics.
Code for the paper "Emergence of collective open-ended exploration from Decentralized Meta-Reinforcement learning" https://arxiv.org/pdf/2311.00651.pdf
We train two decentralized agents together on an open ended tasks space to study the emergence of collective exploration behaviors. Our agents are able to generalize to novel objects and tasks, as well as an essentially open ended setting.
The team's research program, within the domain of developmental artificial intelligence, aims to study mechanisms of open-ended learning, and in particular the role of curiosity-driven autotelic learning and the role of language as a cognitive tool. We study these topics both in humans and AI systems, both at the level of individuals and at the level of cultural groups, and both at the fundamental and application levels.
Here, we present our recent results along the following research dimensions:
The team continued to lay the foundations of autotelic AI 113, 111, i.e. the science stuyding mechanisms enabling artificial agents to learn to represent and sample their own goals and achieve open-ended learning.
In this project, we take a step towards truly open-ended language-based autotelic agents by leveraging GPT3 99, a Large Language Model (LLM) demonstrating impressive language understanding capabilities. For an autotelic agent to be truly open-ended, it needs to be able to:
In this project, we also place ourselves in a textual environment. CookingWorld has been proposed as part of the First TextWorld Problems competition on text agents, and features a house with ingredients that can be cooked to achieve a recipe. We leverage the textual nature of the trajectories of agents in this environment by using GPT (ChatGPT-3.5 in our experiments), with different prompts, as our goal generator, reward function, and relabeller. More precisely (see Figure 2):
Once a goal has been selected by the Goal Generator we need to be able to pursue this goal in the environment. In preliminary experiments we noticed that Deep RL based agents were too sample inefficient to be able to learn an open-ended repertoire of skills in a reasonable time. The solution we implemented is closer to an evolutionary algorithm: the policy maintains a dictionary mapping already experienced goal strings to the shortest experienced sequence of action achieving this goal (according to the relabeller or the reward function). When presented with a new goal, the agent embeds this goal and selects the sequence of actions whose goal key in the dictionary is the closest in embedding space to the one being executed (this match may be exact). The agent then executes this action sequence.
The LM-based Goal generator starts off with a warmup period of 4000 episodes, where goals are sampled based on the ones that have already been experienced. After this period the goal imagination kicks in, and the language model is shown in its prompt a sample of 60 goals that have been achieved in the past. The goal generator is asked to provide a high-level goal that is composed of the chaining of lower level subgoals among the list of goals in the prompt. The goal-conditioned policy then executes the subgoals one after the other, and there is a probability that the last subgoal be cutoff and random exploration be performed. In the random exploration text actions are given priority inversely proportional to the amount of time they have been seen (to encourage the agent to take rare actions when they become available). The overview of the algorithm, as well as an illustration of the information flow, are given in Figures 2 and 3. Figure 4 shows a confusion matrix quantifying how much the reward function agrees with human judgement.
We perform several ablation: in one experiment we ablate the human advice and suggestions given to the model (no human tips), in a further ablation we remove the lm goal generator (the warmup phase stays so indefinetly: no lm goals or human tips) and in a final ablation we remove the human tips and the chain of thought prompting in the few-shot examples we give the LM (no human tips or CoT).
Our results show that LMA3, across methods, is able to discover a very large number of goals diverse goals, on the order of 1 goal per episode. The goals are distinct and also exhibit high diversity in the predicates and objects they cover. We see that removing human tips reduces the diversity of goals generated, because the LM has less diverse examples on which to base itself. Removing the LM goal generation has minimal influence beyond that, and removing chain-of-thought has the most drastic effect. The baseline oracle agent is limited to the 69 goals for which we have an exact reward function and relabeler. We additionally evaluate the agent on the oracle goals for which we have an exact reward function. This is represented in Figure 4. We see that most LMA3 variants achieve a high eval score on these held-out goals, despite never having been told what it would be evaluated on.
This work was presented at the Conference on Lifelong Learning Agents (August 2023) in Montreal.
In this section we will present our recent work on code-generating autotelic agents. The idea of exploring this came from the realization that the open-endedness of our agents is upper-bounded by the complexity of the environment. The LMA3 agent was very powerful but could never learn complex skills because CookingWorld is extremely limited. On the other hand, programming is open-ended and an agent writing code has very few limits on the complexity of what it can accomplish.
We will present two works in a programming domain: Codeplay and ACES.
In this work we propose an approach that is both a way to implement autotelic agents discovering a truly open-ended set of goals, and a way to allow language models to master novel coding skills in interaction with an interpreter. We ground LM-agents in a coding environment that provides straightforward binary rewards through code execution. We set ourselves in the Python Programming Puzzles domain 188, and we define an autotelic agent composed of 2 sub-agents: a Setter that generates puzzles, and a Solver whose objective is to solve the generated puzzles. Both agents are LM policies. They play a collaborative game where the Setter has to create puzzles that push the solver to learn, and the solver sees and tries to solve puzzles in its zone of proximal development (ZPD): hard but still solvable. First experiments presented here (with a fixed Solver) highlight the possibility to tradeoff difficulty of the generated puzzles and their novelty by tuning the reward experienced by the Setter, trained with deep RL (PPO: 186).
Closely related to this work are approaches to autotelic learning or automatic curriculum learning involving goal setter agents 123, 196, 101, 165, 158, as well as the PowerPlay framework of 184. Very closely related to this work as well are approaches for generating novel code puzzles for augmenting the capabilities of code-puzzle-solving language models 127, as well as recent attempts to cast program synthesis as a reinforcement learning problem 144, 208.
We use as our testbed the Python Programming Puzzle (P3) domain of 188. Each puzzle is defined by a test program f designed to verify the validity of solution programs g such that valid solutions satisfy f(g()) is True when run in an interpreter.
P3 puzzles span problems of various difficulties, from trivial to open questions.
We instantiate our Setter as a pretrained language model, finetuned on the P3 domain. We cast the puzzle-generating problem as an MDP where the observation space is the list of all possible sequences of tokens, the action space is the list of all tokens and the reward is the intrinsic motivation measure we compute based on the difficulty of a puzzle. Transitioning from one (fully-observable) state to another is simply appending the emitted token to the observation: the environment is purely deterministic. This training setup is reminiscent of the one in reinforcement learning from human feedback (RLHF: 173), except in our case we do not use a reward function trained from human preferences but an intrinsic motivation metric based on the Solver's abilities. Our Setter agent's stochastic policy is given by the pretrained language model's logits which are used to sample the tokens (temperature of
We want to reward the Setter for producing puzzles that are hard, while not being empirically unsolvable. To do so, we compute a reward based on the number of solutions generated by our fixed Solver within a maximum number of attempts. Easy problems have a reward of 0, unsolvable or syntactically invalid puzzles have a reward of -1, and hard but solvable puzzles have a reward of 1.
Because the Solver is fixed, optimizing only novelty reward
The total reward for the Setter is a weighted sum of the difficulty and novelty rewards:
In the following experimental results we investigate the impact of different values of the weights. We only train the Setter in these experiments.
We use a fixed few-shot prompt for both the solver and the setter. The few-shot prompt simply includes a series of puzzles and solutions from the tutorial set of the P3 training set (which are pretty short, so we can fit all of them). The puzzles and solutions are separated by assert(f(g())) statements. The prompt for the Setter finishes after an assertion, the prompt for the Solver (evaluated
We present here a series of results studying Setter training). We report the learning curves for both reward components
The gray curve provides or baseline: no optimization whatsoever is taking place. On the left-hand side, we see that the puzzles generated in this case are close to trivial (
This is still work in progress, and was been presented in the IMOL conference in Paris in September 2023 and in the IMOL workshop at NeurIPS in New Orleans in December 2023.
In this project, we examine more specifically how one can generate an interesting diversity of programming puzzles (same domain as Codeplay). We recall that this is an important case study for linguistic autotelic agents because it is a first step towards generalist agents inventing their own problems. Inspired by the Evolution Through Large Models (ELM) method where authors evolve robot morphologies expressed as Sodarace programs using a Large Language Model as a mutation operator, we aim to develop an evolutionary method to create a diverse population of problems using pretrained Language Models. We remark that diversity-producing methods (such as Map-Elites) need a Behavioral Characterization (BC) space in which to measure the diversity of their evolved populations; this is feasible with virtual creatures but seems pretty hard with programming puzzles. We thus introduce the notion of a Semantic BC space, composed of abstract categories, and labelling inside this space is done through LLM responses. In our case, we introduce 10 programming descriptors:
We then define an archive of generated programming puzzles and their solutions, and the position of a puzzle in the archive is given by the combination of descriptors that the puzzle-solution pair belongs to (the semantic representation of a puzzle thus being a 10-dimensional vector). The semantic archive is used to store puzzles.
We then perform experiments with the following algorithms:
For all experiments we seed the archive with the P3 train set. (cite the paper)
We report results of our runs in Figure 10. Overall, the methods based on semantic archives, ACES and ELM-Semantic, achieve the highest diversity in the semantic space. We report diversity measures inside the embedding spaces of various smaller language models in Figure 11. In these figures we see that overall ACES outperforms other methods in this measure of diversity. We additionally perform tests of the suitability of generated puzzles as finetuning data for smaller LMs. For all methods, we finetune a smaller model (OpenLlama-3b) on the generated set and we test the pass@k metric for different values of k on the P3 test set; we report the scores in Figure 12. From that figure we see that we encounter a tradeoff between how diverse the data is and how useful it is to get a high score on the P3 test set. Further work is needed to get data that is both diverse and useful.a
The recent rise of Transformer-based Large Language Models (LLMs) trained on massive text datasets has led to models exhibiting impressive capabilities (e.g. natural language generation, question answering, reasoning, translation...). Recently, LLMs were shown to also capture aspects of the physical rules in our world, e.g. about space, colors or even affordances between bodies and objects.
However, LLMs are known to suffer from a lack of grounding (i.e. connecting their inner representation of language to a world) which prevents them from properly dealing with the meaning of concepts and their direct use to solve tasks in interactive environments. Indeed, alignment between statistical structures in such LLMs and environments can be very limited, or even sometimes entirely wrong. This is partly due to 1) a training process (predicting next words) that is not directly incentivized to solve problems in an environment, 2) lack of abilities to intervene in the environment to identify causal structures; 3) lack in abilities to learn based on data collected as a result of interacting with the environment.
Focusing on such a functional competence, we propose to use an LLM as the policy interacting with a textual environment (i.e. a textual description of the scene is provided by the environment and possible actions are text commands) for decision-making problems. Using Reinforcement Learning (RL) to finetune the LLM to make it solve various tasks in this environment, our method named GLAM “functionally grounds” the LLM on the environment. That is, grounding the dynamics and physical rules of an environment to solve problems and obtain, in the end, an operational LLM able to use natural language to solve tasks in this interactive environment.
Applying GLAM to Flan-T5 780M shows how such an LLM can be functionally grounded and be able to solve the tasks in an interactive textual environment. The resulting LLM also exhibits strong generalization when exposed to variations in the objects it must interact with. Finally, we show that incremental intervention using RL is key by comparing it to passive Imitation Learning. Our results highlight that our GLAM method leads to both better results and generalization even when Imitation Learning is performed using an optimal policy.
We showed in recent works how Automatic Curriculum Learning (ACL) could help Deep Reinforcement Learning methods by tayloring a curriculum adapted to learner's capabilities 30, 27. Using ACL can lead to sample efficiency, asymptotic performance boost and help in solving hard tasks.
Parallel to this, recent works in Language Modeling using Transformers (e.g. GPT-2) have starting to get more interested in better understanding convergence and learning dynamics of these models. Trained in a supervised setup, these models are fed with hundred of millions of natural language sequences crawled from the web. The current standard way of training these models (i.e. constructing batches of randomly selected sequences) makes the assumption that all sequences have same interest for the model. However, recent works showed that this does not seem to be the case and that datasets can contain outliers harming training. Additionally, some works also showed that hand-designing a curriculum over sequences (e.g. ordered by their length) could speed up and stabilize training.
Building on this, we propose to experiment how ACL could help taylor such a curriculum in an automated way relying on Learning Progress. Our study has several contributions:
We chose to train GPT-2 on the standard OSCAR dataset and use teacher algorithms to select samples that are shown to the model (see fig. 14).
Using ACL, we perform an in-depth analysis of prior methods changing the size of tokens' sequences observed during training following a hand-designed curriculum. Our experiments showed that a Random baseline outperforms these methods. We also provide, thanks to ACL methods based on Learning-Progress Multi-Armed Bandits, hints that while short sequences should not be used as training advances (as Large Language Models quickly learn them), there is no clear evidence that short sequences should be prioritized (and thus long sequences avoided) at the beginning of training.
Additionally, we performed several experiments using more advanced ACL methods on different task spaces and show that these lead to overfitting and underperform in comparison the no-curriculum strategy usually applied in Language Modeling. We hypothesize that, given how large models used in Language Modeling are, it is better to give a huge amount of very diverse and different samples (even though outliers or harmful samples exist) without any curriculum than using a curriculum that restrains the diversity of samples and introduces duplicates (leading to overfitting).
As generative AI systems become powerful cultural transmission technologies that influence human cultural evolution in important ways, and can also have their own cultural processes through machine-machine large scale interaction, the study of the dynamics of cultural processes in populations of AI systems/humans becomes crucial.
Innovations are a central component of open-ended skill acquisition: they denote the emergence of new solutions by the recombination of existing ones and their presence is necessary to ensure a continuous complexification of an agent's cultural repertoire. While we often tend to attribute discoveries to certain innovative individuals, if we shed a broad perspective at the history of our species we see that human innovation is primarily a collective process. Fields such as psychology and anthropology have been studying the ability of human groups to innovate for some time, with studies indicating that the social network structure has a significant impact: fully-connected structures are better suited for quick convergence in easy problems with clear global optima, while partially-connected structures perform best in difficult tasks where local optima may lure agents away from the globally optimal solution 119. At the same time a parallel story is unfolding in reinforcement learning (RL): distributed RL is a sub-field where multiple agents solve a task collectively 159. Compared to the single-agent paradigm, distributed RL algorithms converge quicker and often achieve superior performance. However, these algorithms have only considered full connectivity. In this inter-disciplinary project, we presented a novel learning framework that augments distributed RL with the notion of a social network structure and employed it to study the hypothesis from human studies that partial connectivity performs best in innovation tasks.
We implemented such innovation tasks using Wordcraft, a recently introduced RL playground inspired from the Little Alchemy 2 game (see left of figure 15 for an illustration of how this task works). We considered a wide diversity of social network structures: static structures that remain constant throughout learning (fully-connected, ring, small-world) and a dynamic structure where the group oscillates between phases of low and high connectivity (we illustrate this dynamic structure on the right of figure 15). Each agent in our implementation employs the DQN learning algorithm and exchanges experiences that have the form of sequences of state-action combinations with its neighbors.
A central conclusion of our empirical analysis was that the dynamic social network structure performs best. In addition to the performance groups achieve we measured behavioral and mnemonic metrics such as behavioral conformity and mnemonic diversity. Such metrics were inspired from human studies and helped us further analyze the behavior of groups. For example, one empirical observation was that sharing experiences did not help the group learn quicker in a very simple innovation task; instead the fully-connected group was the slowest. By looking at the diversity in the memories of the agents we observed that the fully-connected structure had the highest individual diversity (left of figure 16 ) and the lowest group diversity (right of figure 16): sharing experiences with others diversifies an individual's experiences but also homogenizes the group, which is bad for its performance.
We see the contribution of this project as two-fold. From the perspective of fields studying human intelligence, we have shown that using RL algorithms as computational tool is a promising direction towards increasing the verisimilitude of simulations and analyzing both behavior and memory. From the perspective of RL, we have shown that distributed RL algorithm should move beyond the fully-connected architecture and explore groups with dynamic topologies. This work is currently a preprint 162 and is about to be submitted in PNAS. We open-source the code at this link.
Theoretical and empirical work in cultural evolution often assume populations in which individuals all agree on the quality of a given cultural trait (i.e. they have homogeneous preferences), and where those preferences are stable in time. Yet, these assumptions are not always met: for example, an uneven distribution of information in a population could lead to heterogeneous preferences; moreover, in some cultural domains (e.g. aesthetic culture), diverse preferences may be the norm rather than the exception. In this project in collaboration with Martí Sànchez Fibla from Universitat Pompeu Fabra, we designed an agent-based model in which we can control the heterogeneity of preferences, as well as the effect of cultural traits on the evolution of preferences. We find that assuming homogeneous or heterogeneous preferences leads to different predictions on several outcomes. First, populations with greater heterogeneity of preferences converge toward greater cultural diversity. Second, while we replicate the classical result that increasing opportunities to learn socially leads to less diversity in homogeneous populations, we find that this relationship is reversed in heterogeneous populations. We show that this happens because increasing social learning opportunities leads the distribution of cultural traits to converge toward the distribution of preferences. We also look at the consequences of allowing cultural traits to modify the preferences of individuals that possess them. This can for example capture self-reinforcing beliefs, or traits where the acquisition costs make individuals less likely to switch to another trait after possessing them for some time. We find that such “attractive” cultural traits naturally emerge in our model, and that they tend to decrease cultural diversity when preferences are not homogeneous. Overall, by showing that the effect of different parameters on cultural diversity are dependent on the assumed distribution of preferences, we highlight the importance of taking into account the possible heterogeneity of preferences when making predictions about cultural dynamics. An abstract for a poster was submitted to the conference of the European Human Behaviour and Evolution Association (EHBEA 2024).
Cumulative culture describes the gradual accumulation and spread of innovations in a population, which results in the formation of a cultural repertoire that no individual could have independently invented on its own. As cumulative culture has been argued to underlie the ecological success of humans, understanding the mechanisms that underpin it has raised a lot of interest. However, computational models of cultural evolution have mostly been restricted agent-based models, which made it difficult to study the consequences of some cognitive capacities on cultural dynamics. In particular, these models often assume that cultural variation is generated in a random manner, which overlooks the sophisticated exploration strategies employed by humans and other animals, such as curiosity-driven exploration. In this project, we aim to fill this gap by modeling agents as reinforcement learners endowed with an intrinsic motivation to generate and pursue their own goals. This will allow to study how a group of curious agents may take cultural trajectories compared to non-curious agents, as well as to look at more sophisticated forms of cultural transmission such as goal transmission. This project is done in collaboration with Maxime Derex (IAST, Toulouse)
As introduced earlier, Vygotskian artificial agents internalize cultural conventions in order to transform linguistic production into cognitive tools that help them acquire new skills. A fundamental question is therefore to investig,ate how such cultural conventions can emerge between agents situated in social contexts.
In this experiment, we are interested in interactive agents that learn to coordinate, namely, a builder – which performs actions but ignores the goal of the task, i.e. has no access to rewards – and an architect which guides the builder towards the goal of the task. We define and explore a formal setting where artificial agents are equipped with mechanisms that allow them to simultaneously learn a task while at the same time evolving a shared communication protocol. Ideally, such learning should only rely on high-level communication priors and be able to handle a large variety of tasks and meanings while deriving communication protocols that can be reused across tasks. We present the Architect-Builder Problem (ABP): an asymmetrical setting in which an architect must learn to guide a builder towards constructing a specific structure. The architect knows the target structure but cannot act in the environment and can only send arbitrary messages to the builder. The builder on the other hand can act in the environment, but receives no rewards nor has any knowledge about the task, and must learn to solve it relying only on the messages sent by the architect. Crucially, the meaning of messages is initially not defined nor shared between the agents but must be negotiated throughout learning. The Architect-Builder problem was initially introduced by Vollmer et. al 204 in an experiment named the CoCo game studying the formation of communication protocol between humans in such a context. Diagrams of interactions in the CoCo game and in our numerical adaptation are given in figure 17.
Under these constraints, we propose Architect-Builder Iterated Guiding (ABIG), a solution to ABP where the architect leverages a learned ,model of the builder to guide ,it while the builder uses self-imitation learning to reinforce its guided behavior. We analyze the key learning mechanisms of ABIG and te, i,t in 2D tasks involving grasping cubes, placing them at a given location, or building various shapes. ABIG results in a low-level, high-frequency, guiding communication protocol that not only enables an architect-builder pair to solve the task at hand, but that can also generalize to unseen tasks as illustrated in figure 18. These results were published at the International Conference on Representation Learning (ICLR 2022) 91.
The intrinsically-motivated goal-conditioned learning paradigm is well-established in single-agent settings: by setting its own goals and pursuing them in an environment without external supervision, such an agent is able to acquire a wide diversity of skills 112. Such agents are called autotelic, from the Greek words auto (self) and telos (end). What happens when you transfer the autotelic paradigm to multi-agent environments, where some skills may require the cooperation of multiple agents (lifting a heavy box for example)? This is the question we aimed at addressing in this project. We believe that multi-agent applications will benefit from agents that can autonomously discover and learn cooperative skills, but entail additional challenges to the ones found in single-agent settings: agents that independently set their own goals will have a very low probability of simultaneously sampling the same cooperative goal, which will make solving these goals difficult.
To explore this question, we implemented what we call cooperative navigation tasks in the Simple Playgrounds environments. This 2-D environment, illustrated on the left of figure 19, consists of a room with 6 landmarks on its walls and two agents that receive continuous-valued observations about the distance and angle to all landmarks and other agents and perform discrete-valued actions that control their angular velocity and longitudinal force. A navigation task is a description of the landmarks that need to be reached: some tasks are individual (for example "at least one agent reaches the red landmark") and some are cooperative (for example "at least one agent reaches the red landmark and at least one agent reaches the blue landmark"). Each autotelic agent is learning using the RL algorithm PPO and, at each training episode, chooses which goal to pursue by random sampling within the goal distribution. In addition to a policy conditioned on its goals, the agent also needs a reward function that indicates whether a goal is achieved (see the schematic on the right of figure 19 for an illustration of the main algorithmic components of autotelic agents). In this project, we assume that the two agents already know this reward function and focus on examining the process of decentralized goal selection.
Our empirical analysis of this set-up showed that goal alignment is important for solving the cooperative tasks we considered: agents that were independently sampling the goals failed to solve all tasks (see orange curve in figure 20) while agents whose goals were determined by a centralized process (blue curve) that guaranteed that the two agents are always pursuing the same goal performed optimally. We then wondered: can we achieve the same performance without requiring centralization? To achieve this we designed a communication-based algorithm that enables a group to align its goals while remaining decentralized: at the beginning of an episode and before determining their goals the two agents exchange messages and then use these messages to condition their goal-selection (see dashed arrows in the schematic on the right of figure 19). This communication is asymmetric: one randomly chosen agent, the leader, uses its goal generator to choose which goal to pursue and then decides what message to transmit to the follower, which conditions its goal selection on the received message. We observed that the agents learn a communication protocol that leads to the alignment of cooperative goals, even though they were not directly incentivised to do so. They were both independently learning a protocol that maximised their individual rewards but, as we show in our experiments corresponding to figure 20, goal alignment was able to emerge from such decentralized learning. We called this algorithm the Goal-coordination game, as it was inspired from another emergent communication algorithm called the Naming game 194.
To get a better understanding of how alignment helps solve this task we measured specialization, which is the tendency of agents to always go the same landmark when there are two options. For example, if for the goal "at least one agent reaches the red landmark and at least one agent reaches the blue landmark" the first agent always go to red and the second goes to blue, then specialization is maximum. We empirically observed that specialization correlates with alignment and is an optimal strategy in our tasks (see right plot of figure 20).
In 2023, as advised by reviewers, additional experiments on baselines to compare were conducted. We also conducted further experiments trying to understand better what are the causes of this inability to learn in the independant (0%align) case.
This work was accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2023 and presented at the poster session. A preprint version is available on HAL 52. The source code for reproducing the experiments is available at this link.
Developmental psychologists have long-established socio-cognitive abilities as a core aspect of human intelligence and development 205100. Those abilities enable us to enter, participate in and benefit from human culture. Humans are then able to push our culture forward by improving on already existing cultural artifacts. This is referred to as the cumulative cultural evolution, and it has been argued that the most impressive of human achievements are the product of it 201. It seems clear that to construct artificial agents capable of interacting with and learning from humans one must equip them with socio-cognitive abilities enabling them to enter the human culture. This would enable artificial agents to benefit from our culture, and also push it forwards, i.e. to participate in the cumulative cultural evolution.
Current AI research mostly studies asocial settings or, in the case of Multi-Agent Reinforcement Learning, the emergence of culture (how culture emerges in the first place, rather than how one enters an already existing culture). Furthermore, this research is often done without a strong grounding in developmental psychology.
In this project, we follow the work of Michael Tomasello and Jerome Bruner who outlined a set of core socio-cognitive skills, e.g. social cognition (joint attention, theory of mind), referential and conventionalized communication, imitation, role-reversal, scaffolding, and many more 200, 100. Importantly, they also showed the importance the (cultural) environment for cognitive development.
To introduce some of those concepts to the AI community, we created a tool for procedural generation of environments - The SocialAI school. With The SocialAI school, experiments studying those socio-cognitive abilities can be easily conducted and cognitive-science experiments reconstructed. Furthermore, The SocialAI School enables us to generate both multimodal environments (suited for RL agents) and pure text version of those enviroments (suited for LLM-based agents). An example of a SocialAI school environment is shown in figure 21. In it the peer is pointing towards the red box. The agent (the red triangle) has to infer this to mean that the apple is hidden inside the red box.
We conducted many experiments, here we outline a few more important ones. We experimented with with multimodal RL agents. We tested generalization of inferring the meaning of referential communication (ex. the pointing gesture) to new scenarios/objects. We found that such a generalization is very hard for standard reinforcement learning agents. We show how a scaffolded environment helps with learning complex interaction sequences (formats). To show how cognitve science experiments can be recreated - we reconstruct a study of role reversal from 122. Furthermore, we conducted experiments regarding other aspects of social-cognition: joint attention, imitation, perspective taking, etc. We experimented with LLM-based interactive agents. We show that a simple LLM-based agent can achieve some perfomrance but still fails to generalize. This motivates future work for creating more complex LLM-based agents.
Most of this project was done in 2022. In this year, we extended the project by adding additional experiments with LLM-based agnets and updating the implementation.
There has been a growing body of research using Large Language Models (LLMs) to simulate individuals or human populations. Those studies usually focus on how a model can express some behavior or values and often overlook the underlying problem that LLMs are highly context-depenant. This problem is even more exacarbated by the use of psychological questionnares, which were created with the assumption of human-like context-independence. In this project 74, we robustness of LLMs to seemingly unrelated perturbations in the context. Instead of evaluating models on some behavior (and testing robustness along the way), we study how a model's behavior changes over contexts as a primary question. In other words, we study and compare the value stability of LLMs along different contexts.
We leverage the PVQ questionnaire 106 associated with the Schwartz Theory of Basic Values 189, which defines ten values: Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, Universalism. We study the LLMs abilities to simualte an individual (i.e. without defining a specific persona to simulate) and populations (by instructing the model to simulate various well-known fictional and real-world individuals). Different contexts are induced by simulating conversations on different topics: correcting grammar, writing a poem, playing chess, answering a history question and inventing a joke.
Following psychological methodology we evaluate three types of value stability: mean-level, rank-order, ipsative. We observe that LLMs exhibit low value stability - trivial context changes induce similar or bigger changes in value expression, than those observed in humans as a consequence of much more aggressive circumstances (e.g. 10 years of development or priming in humans compared to a topic change in LLMs). These results push furhter the point that psychological questionnaires administered to LLMs cannot be used and interpreted in the same way as with humans, i.e. much more attention must be given to studing context-dependance. Most importantly, they imply that rather than evaluating many questions from a single context (as is currenlty common), one should also evaluate same questions from many differt contexts. We propose metrics based on these types of value stability, and systematically compare models in terms of their value stability (see figure 22). To our knowledge we propose the first systematic comparison of many different models on their value stability.
An intriguing feature of the human species is our ability to continuously invent new problems and to proactively acquiring new skills in order to solve them: what is called Open-Ended Skill Acquisition (OESA). Understanding the mechanisms underlying OESA is an important scientific challenge in both cognitive science (e.g. by studying infant cognitive development) and in artificial intelligence (aiming at computational architectures capable of open-ended learning). Both fields, however, mostly focus on cognitive and social mechanisms at the scale of an individual’s life. It is rarely acknowledged that OESA, an ability that is fundamentally related to the characteristics of human intelligence, has been necessarily shaped by ecological, evolutionary and cultural mechanisms interacting at multiple spatiotemporal scales.
We have recently initiated a new research direction aiming at understanding, modeling and simulating the dynamics of OESA in artificial systems, grounded in theories studying its eco-evolutionary bases in the human species. For this aim, we have proposed a conceptual framework, called ORIGINS (illustrated Fig. 23 and developed in 157), expressing the complex interactions between environmental, adaptive, multi-agent and cultural dynamics. This framework raises three main research questions:
The contributions described below are addressing some aspects of these research questions. Note that there might be a thematic overlap between the two last research questions outlined above and the previous section on Models of Cultural Evolution 8.2, where we also present related results.
The diversity and quality of natural systems have been a puzzle and inspiration for communities studying artificial life. It is now widely admitted that the adaptation mechanisms enabling these properties are largely influenced by the environments they inhabit. Organisms facing environmental variability have two alternative adaptation mechanisms operating at different timescales: plasticity, the ability of a phenotype to survive in diverse environments and evolvability, the ability to adapt through mutations. Although vital under environmental variability, both mechanisms are associated with fitness costs hypothesized to render them unnecessary in stable environments. In this work, we aimed at studying the interplay between environmental dynamics and adaptation in a minimal model of the evolution of plasticity and evolvability.
To achieve this we designed a simulation environment that attempts to capture the spatial and temporal heterogeneity of real-world environments while keeping the computational complexity low: the user can choose the number of niches, which are arranged in a simple longitudinal model, and a climate function that captures the temporal variation of environmental conditions, which are arranged based on a simple longitudinal model (see left of figure 24 for an illustration of the environment). We defined the evolvability of an agent as its mutation rate and capture plasticity using tolerance curves, a tool developed in ecology 126. Tolerance curves (which we visualize on the right of Figure 24) have the form of a Gaussian whose mean shows the preferred environmental state of an individual and the variance its plasticity, i.e., its ability to survive under different environmental conditions. This figure also illustrates the cost and benefit of plasticity. If both individuals are at their preferred niche, which coincides with the environmental state, then the plastic individual has lower fitness that the non-plastic (cost of plasticity). If the actual environmental state differs significantly from the preferred one, the plastic individual has higher fitness (benefit of plasticity).
We conducted an extensive empirical study in this environment that aimed at disentangling the effects of different mechanisms: we studied three types of climate functions (stable, sinusoid, noisy) and two types of evolutionary selection pressures (survival of the fittest and niche-limited competition) and environments were the number of niches varies from 1 to 100. Through these simulations we showed that environmental dynamics affect plasticity and evolvability differently and that the selection mechanism matters: a) in stable environments with a large number of niches when both selection-based fitness and niche-limited competition (we call this method NF-selection) are activated, plasticity remains high despite its cost (see left plot in Figure 25) ; b) in a noisy environment introducing niche-limited competition (N-selection and NF-selection) makes populations capable of resisting larger amounts of noise (see right plot in Figure 25). We presented our work at GECCO 2022 164 and open-sourced the software for reproducing our simulations in this repository. A follow-up of this work has been published at the ALIFE 2023 conference 53, where we introduced mechanisms of niche constrution to this model.
As a first step towards studying the evolution of open-ended skill acquisition in artificial agents, we studied the environmental conditions favoring the systematic exploration of combinatorial recipes involving various objects. By combinatorial recipe, we mean the ability of agents to combine objects in the environment in order to create new ones (in the spirit of the Minecraft video game), some of these crafted objects being associated with a reward. In this work, the training of an agent uses meta reinforcement learning where an outer loop, equivalent to an evolutionary mechanism, meta-learns the parameters of an inner loop which can be seen as a developmental mechanism (where the agent acquire skills during its lifetime by interacting with the environment). In the current setup we use
Our experiments with recipe crafting are inspired by the little alchemy game. The difference with previous works in similar environments (e.g. 8.2.1) is that at every episode the structure of the recipe is randomly chosen. The agent therefore cannot predict what recipes will be rewarding and have to explore different combinations of objects in order to find the rewarding ones. The agent should also memorize the successful and unsuccessful combinations in order to explore and exploit efficiently.
Our preliminary results use both in a vectorized version of the game (where the agents action are only to choose the 2 objects to combine) and an embodied gridworld version (where the agent has to move, grab objects and put them on top of others in order to craft new ones). In both of these cases, the training efficiently meta learns an exploration/exploitation strategy which is to try new recipes (most of the time it does not try non working recipes more than once) until it finds the rewarding ones and then simply exploits them by making them over and over.
Further work will study how we can change the environment/training in order to evolve open-ended exploration strategies where an agent will continuously explore new recipes even if it has already found rewarding ones, as a way to be better prepared for future changes in the recipe structure. We hypothesize that such an intrinsic motivation to explore for the sake of acquiring knowledge of the environment, even in the absence of external rewards, could evolve by introducing drastic changes of recipes which the agent has to anticipate in order to survive. During the project, we switched from evolutionary algorithm and Recurrent neural networks to reinforcement learning and transformers. This allowed for more complex environments with more possibilities. We also obtained preliminary results with agents exploring the environment to gain information for the future.
This work uses the JAX python library for both the model/machine learning part and the environment simulation. JAX allows easy parallelization and fast GPU computation and so learning it through this project will be useful for later projects.
We plan to submit a paper on these experiments in 2024.
This contribution was realized in the context of the Master internship of Corentin Léger in 2023, as a collaboration between C. Moulin-Frier from the Flowers team and Xavier Hinaut from the Mnemosyne team. It led to a paper which has been accepted at the Evostar conference 75.
Animals demonstrate remarkable adaptability to their environments, a trait honed through the evolution of their morphological and neural structures 199174. Animals are born equipped with both hard-wired behavior routines (e.g. breathing, motor babbling) and learning capabilities to adapt from experience. The costs and benefits of evolving hard-wired behaviors vs. learning capabilities depends on different factors, a central one being the level of unpredictability of environmental conditions across generations 195138. Phenotypic traits addressing environmental challenges that are shared across many generations are more likely to evolve hard-wired (e.g. breathing), while traits whose utility can hardly be predicted from its utility in previous generations are likely to be learned through individual development (e.g. learning a specific language).
This prompts an intriguing question: How can neural structures, optimized at an evolutionary scale, enhance the capabilities of agents to learn complex tasks at a developmental scale? To address this question, we propose to model the interplay between evolution and development as two nested adaptive loops: evolution optimizes the generation of neural structures through natural selection over generations, shaping developmental learning during an agent’s lifetime (Fig. 26).
More precisely, at the evolutionary scale (the outer loop), we use an evolutionary algorithm to optimize a genome specifying Hyper Parameters of Reservoirs 185. At a developmental scale (the inner loop), a RL agent equipped with a generated reservoir learns an action policy to maximize cumulative reward in a simulated environment. Thus, the objective of the outer evolutionary loop is to optimize macro properties of reservoirs in order to facilitate the learning of an action policy in the inner developmental loop. See Fig.27 for an overview of the method.
Using this computational model, we run experiments in diverse simulated environments, e.g. 2D environments where the agent learns how to balance a pendulum and 3D environments where the agent learns how to control complex morphologies. These experiments provide support to three main hypotheses for how evolved reservoirs can affect intralife learning. First, they can facilitate solving partially-observable tasks, where the agent lacks access to all the information necessary to solve the task. In this case, we test the hypothesis that the recurrent nature of the reservoir will enable learning to infer the unobservable information. Second, it can generate oscillatory dynamics useful for solving locomotion tasks. In this case, the reservoir acts as a meta-learned CPG. Third, it can facilitate the generalization of learned behaviors to new tasks unknown during the evolution phase.
This work was accepted at EvoApplications 27th European Conference on the Applications of Evolutionary and bio-inspired Computation (EvoApps 2024).
In this work we want to further investigate the emergence of cooperative exploration strategies of decentralized agents by training them on a open ended distribution of tasks. To this end we introduce a novel environment 28 which is conceptually simple yet allows for a complex open ended procedurally generated task space by dynamically combining multiple subtasks sampled from five task types to form a task tree which needs to be solved sequentially, akin to the notion of recipes in 93. We train two agents parametarized by independent recurrent neural networks and optimized using standard proximal policy optimization. As no information is given to the agents about which subtasks have been sampled or how and in which order they should be solved, the agents have to develop general strategies for exploring the environment, effectively learning how to learn from the information obtained by interacting with the environment throughout the episode, in order to solve novel tasks. We show that training independent decentralized agents on only multi agent episodes leads to sub-optimal behavior of the agents, primarily due to the problem of credit assignment when rewards are shared between agents. We propose to include single agent episodes during training to force the agents to learn to solve tasks on their own without relying on any help from other agents. We find that training on a mixture of single and multi agent episodes increases the agents individual performance while simultaneously decreasing the individual performance differences between the agents, leading to a strong improvement in performance in multi agent tasks.
Using this approach we find that decentralized agents trained in our procedurally generated environment learn a powerful collective exploration strategy, allowing them to solve over 70 percent of task trees encountered during training. Moreover, these powerful exploration capabilities lead to strong generalization performance when confronted with objects unseen during training, as well as on novel tasks which require complex coordination to be solved successfully at test time. Additionally we show that the learned collective exploration strategies extend to the open ended task setting, enabling the agents to effectively generalize to task trees with a depth of six, featuring an increased complexity of subtasks, despite being initially trained on task trees comprising only three subtasks.
This work was presented as a poster at the Agent Learning in Open-Endedness (ALOE) workshop at NeurIPS 2023. Videos of the agents behaviors can be found on our companion website
This contribution is the result of a collaboration between Ricard Solé and Martí Sànchez-Fibla from the University Pompeu Fabra (Barcelona, Spain) and Clément Moulin-Frier (Flowers, Inria). A preprint is available 78 and it a paper has been submitted to PNAS.
Humans have been able to tackle biosphere complexities by acting as ecosystem engineers, profoundly changing the flows of matter, energy and information. This includes major innovations that allowed to reduce and control the impact of extreme events. Modelling the evolution of such adaptive dynamics can be challenging given the potentially large number of individual and environmental variables involved. This paper shows how to address this problem by using fire as the source of external, bursting and wide fluctuations. Fire propagates on a spatial landscape where a group of agents harvest and exploit trees while avoiding the damaging effects of fire spreading. The agents need to solve a conflict to reach a group-level optimal state: while tree harvesting reduces the propagation of fires, it also reduces the availability of resources provided by trees. It is shown that the system displays two major evolutionary innovations that end up in an ecological engineering strategy that favours high biomass along with the suppression of large fires. The implications for potential A.I. management of complex ecosystems are discussed.
The computational model is illustrated Fig. 29 and the main results in Fig. 30.
This work focuses on eco-evolutionary dynamics where "organisms are not solely products but, by modifying their niche and therefore its associated fitness landscape, are also causes of evolution" 142. The main objective of this paper is to propose a method for studying large-scale eco-evolutionary dynamics in agent-based simulations with a reasonable level of biological and ecological plausibility. For this aim, we implement a system with the following properties (see Fig. 31 for illustration):
In addition to experiments conducted in the large environment presented, we also conduct experiments in "lab environment" (as opposed to the "natural environment") to isolate the study of certain behavior (which are often intertwined with a lot of dynamics in the natural environment).
One interesting results of these simulation is the emergence of sustainable foragers which as shown in lab environment Fig.32 tends to not overconsume when there is enough resource in their neighbourhood. This allows to keep a certain amount of resource to spread which is therefore beneficial for their future survival as well as the survival of their offspring. (as there is no reset of the environment)
This work was presented as a poster to the Genetic and Evolutionary Computation Conference (GECCO) 2023.
Since 2019 (Idex cooperation fund between the University of Bordeaux and the University of Waterloo, Canada) and the recent creation of CuriousTECH associate team in 2022 (led by the Flowers team and involving F. Lotte from the Potioc team and M. Fernendes and E. Law from the Waterloo University), we continue our work on the development of new curiosity-driven interaction systems . Substantial progress has been made in this area of application of FLOWERS works (see the website of CuriousTECH team : https://flowers.inria.fr/curioustech-associate-team/).
For a better understanding of basic mechanisms of curiosity-based learning, three studies have been completed. The first study regards a new interactive educational application to foster curiosity-driven question-asking in children. This study has been performed during the Master 2 internship of Mehdi Alaimi co-supervised by H. Sauzéon, E. Law and PY Oudeyer. It addresses a key challenge for 21st-century schools, i.e., teaching diverse students with varied abilities and motivations for learning, such as curiosity within educational settings. Among variables eliciting curiosity state, one is known as « knowledge gap », which is a motor for curiosity-driven exploration and learning. It leads to question-asking which is an important factor in the curiosity process and the construction of academic knowledge. However, children questions in classroom are not really frequent and don’t really necessitate deep reasoning. Determined to improve children’s curiosity, we developed a digital application aiming to foster curiosity-related question-asking from texts and their perception of curiosity. To assess its efficiency, we conducted a study with 95 fifth grade students of Bordeaux elementary schools. Two types of interventions were designed, one trying to focus children on the construction of low-level question (i.e. convergent) and one focusing them on high-level questions (i.e. divergent) with the help of prompts or questions starters models. We observed that both interventions increased the number of divergent questions, the question fluency performance, while they did not significantly improve the curiosity perception despite high intrinsic motivation scores they have elicited in children. The curiosity-trait score positively impacted the divergent question score under divergent condition, but not under convergent condition. The overall results supported the efficiency and usefulness of digital applications for fostering children’s curiosity that we need to explore further. The overall results are published in CHI'20 87. In parallel to these first experimental works, we wrote this year a review of the existing works on the subject 103.
The second study investigates the neurophysiological underpinnings of curiosity and the opportunities of their use for Brain-computer interactions 88. Understanding the neurophysiological mechanisms underlying curiosity and therefore being able to identify the curiosity level of a person, would provide useful information for researchers and designers in numerous fields such as neuroscience, psychology, and computer science. A first step to uncovering the neural correlates of curiosity is to collect neurophysiological signals during states of curiosity, in order to develop signal processing and machine learning (ML) tools to recognize the curious states from the non-curious ones. Thus, we ran an experiment in which we used electroencephalography (EEG) to measure the brain activity of participants as they were induced into states of curiosity, using trivia question and answer chains. We used two ML algorithms, i.e. Filter Bank Common Spatial Pattern (FBCSP) coupled with a Linear Discriminant Algorithm (LDA), as well as a Filter Bank Tangent Space Classifier (FBTSC), to classify the curious EEG signals from the non-curious ones. Global results indicate that both algorithms obtained better performances in the 3-to-5s time windows, suggesting an optimal time window length of 4 seconds to go towards curiosity states estimation based on EEG signals. These results have been published 88
Finally, the third study investigates the role of intrinsic motivation in spatial learning in children 79. In this study, the state curiosity is manipulated as a preference for a level of uncertainty during the exploration of new environments. To this end, a series of virtual environments have been created and is presented to children. During encoding, participants explore routes in environments according the three levels of uncertainty (low, medium, and high), thanks to a virtual reality headset and controllers and, are later asked to retrace their travelled routes. The exploration area and the wayfinding. ie the route overlap between encoding and retrieval phase, (an indicator of spatial memory accuracy) are measured. Neuropsychological tests are also performed. Preliminary results showed that there are better performances under the medium uncertainty condition in terms of exploration area and wayfinding score. These first results supports the idea that curiosity states are a learning booster 79.
At the end of 2020, we started an industrial collaboration project with EvidenceB on this topic (CIFRE, contract of Rania Abdelghani validated by the ANRT). The overall objective of the thesis is to propose new educational technologies driven by epistemic curiosity, and allowing childre,n to express themselves more and learn better. To this end, a central question of the work will be to specify the impact of self-questioning aroused by states of curiosity about student performance. Another objective will be to create and study the pedagogical impact of new educational technologies in real situations (schools) promoting an active education of students based on their curiosity. To this end, a web platform called 'Kids Ask' has been designed, developed and tested in three primary schools. The tool offers an interaction with a conversational agent that trains children's abilities to generate curiosity-driven questions and use these questions to explore a learning environment and acquire new knowledge. The results suggest that the configuration helped enhance children's questioning and exploratory behaviors; they also show that learning progress differences in children can be explained by the differences in their curiosity-driven behaviors 85.
Despite showing pedagogical efficiency, the method used in the first study of this PhD is still very limited since it relies on generating curiosity-prompting cues by hand for each educational resource in order to feed the "discussion" with the agent, which can be a very long and costly process. For this reason, a logical follow-up to scale-up and generalize this study was to explore ways to automate the said conversational agents' behaviors in order to facilitate their implementation on a larger scale and for different learning tasks. More particularly, we move towards the natural language processing (NLP) field and the large language models (LLMs) that showed an impressive ability in generating text that resembles the way people write.
In this context, we study using the recent LLM GPT-3 to implement conversational agents that can prompt children's curiosity about a given text-based educational content, by proposing some specific cues. We investigate the validity of this automation method by comparing its impact on children's divergent question-asking skills with respect to the hand-crafted condition we had in our previous work. In a second step, we explore using GPT-3 to propose a new curiosity-prompting behavior for our agent that aims to better support the children's needs of competence, autonomy and relatedness during the question-asking training.
The study was conducted in two primary schools with 75 children aged between 9 and 11. Our first results suggest the validity of using GPT-3 to facilitate the implementation of curiosity-stimulating learning technologies. Indeed, children's performance was similar between the conditions where they had hand-generated or GPT-3-gene,rated cues. In a second step, we also found that GPT-3 can be efficient in proposing the relevant cues that leave children with more autonomy to express their curiosity 33 (publication in process).
Finally, as a follow-up direction to this line of work, we design new digital interventions, that focus on eliciting the metacognitive mechanisms involved in the stimulation and continuity of curiosity, and not just giving the tools to pursue it as done in the previous studies. For this, we use findings from the theories explaining curiosity in order to break this latter up into a set of relevant metacognitive skills. We then take an operational approach to propose adequate digital metacognitive exercises for each one of the said skills (i.e. exercises to identify uncertainty, generate hypotheses etc). We aim to implement this set of metacognitive exercises and investigate its impact on children's abilities to initiate and maintain curious behaviors. We would also be interested in investigating the impact of such training on the learning progress children can achieve. A first study has been conducted with two classrooms to evaluate the accessibility of this new training and and the impact on metacognitive efficiency, curiosity-driven question-asking and learning. Our first results being rather positive, we aim to recruit a bigger sample size to validate them 41.
Thanks to the building of primary schools networks (Submission of Léa-Ifé project to ENS-Lyon Call), the next step of this work is to study this digital metacognitive intervention in more ecological settings, specifically when administered by teachers, since this reflects the classical classroom setting. The aim is therefore to support teachers in assimilating the intervention to facilitate its transfer into real classrooms. This has already been initiated in collaboration with the Académie de Bordeaux and elementary school teachers of Bordeaux Métropole, as the first steps of the thesis project that started in October (Chloé Desvaux - Université de Bordeaux). The efficacy of the ecological digital intervention is to be compared with previous results. Another objective of this thesis project is to assess the characteristics of children’s curious behaviors more closely. The primary focus being divergent and creative properties and their implication in learning.
On another subject, we also started investigating the importance of curiosity-related metacognitive skills on students' use of the GenAI (Generative AI) tools during learning. Indeed, in 59 , we argue about the importance of developing children's sense of critical thinking, epistemic vigilence, etc in order to allow a more active and informed use of these tools during learning. Such skills can help have a more realistic expectations of such tools and evaluate their outputs before integrating them in one's beliefs. A study aiming to understand how children use these tools to solve learning problems is in progress (piloting stage).
Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond. This work was published in 56 and involved a collaboration with Z. Xiao, V. Liao, E. Yuan from Microsoft Research Montreal.
is a research project studying how machine learning can be applied to intelligent tutoring systems. It aims at developing methodologies and software which adaptively personalize sequences of learning activities to the particularities of each individual student. Our systems aim at proposing to the student the right activity at the right time, maximizing concurrently his learning progress and his motivation. In addition to contributing to the efficiency of learning and motivation, the approach is also made to reduce the time needed to design ITS systems.
We continued to develop an approach to Intelligent Tutoring Systems which adaptively personalizes sequences of learning activities to maximize skills acquired by students, taking into account the limited time and motivational resources. At a given point in time, the system proposes to the students the activity which makes them progress faster. We introduced two algorithms that rely on the empirical estimation of the learning progress, RiARiT that uses information about the difficulty of each exercise and ZPDES that uses much less knowledge about the problem.
The system is based on the combination of three approaches. First, it leverages recent models of intrinsically motivated learning by transposing them to active teaching, relying on empirical estimation of learning progress provided by specific activities to particular students. Second, it uses state-of-the-art Multi-Arm Bandit (MAB) techniques to efficiently manage the exploration/exploitation challenge of this optimization process. Third, it leverages expert knowledge to constrain and bootstrap initial exploration of the MAB, while requiring only coarse guidance information of the expert and allowing the system to deal with didactic gaps in its knowledge. The system was evaluated in several large-scale experiments relying on a scenario where 7-8 year old schoolchildren learn how to decompose numbers while manipulating money 109. Systematic experiments were also presented with simulated students.
An experiment was held between March 2018 and July 2019 in order to test the Kidlearn framework in classrooms in Bordeaux Metropole. 600 students from Bordeaux Metropole participated in the experiment. This study had several goals. The first goal was to evaluate the impact of the Kidlearn framework on motivation and learning compared to an Expert Sequence without machine learning. The second goal was to observe the impact of using learning progress to select exercise types within the ZPDES algorithm compared to a random policy. The third goal was to observe the impact of combining ZPDES with the ability to let children make different kinds of choices during the use of the ITS. The last goal was to use the psychological and contextual data measures to see if correlation can be observed between the students psychological state evolution, their profile, their motivation and their learning. We first show that LP-based personalization improves learning performance (reproducing and solidifying previous results) while producing a positive and motivating learning experience. We then show that the addition of self-choice as a playful feature triggers intrinsic motivation in the learner and reinforces the learning effectiveness of the LP-based personalizing. In doing so, it strengthens the links between intrinsic motivation and performance progress during the serious game. Conversely, deleterious effects of the playful feature are observed for hand-designed linear paths. Thus, the intrinsic motivation elicited by a playful feature is beneficial only if the curriculum personalization is effective for the learner. Such a result deserves great attention due to the increased use of playful features in non adaptive educational technologies available in the market. Details of these new results, as well as the overall results of this project, are presented in Benjamin Clément PhD thesis 108 and are currently being processed to be published.
The algorithms developed during the Kidlearn project and Benjamin Clement thesis 108 are being used in an innovation partnership for the development of a pedagogical assistant based on artificial intelligence intended for teachers and students of cycle 2. The algorithms are being written in typescript for the need of the project. The expertise of the team in creating the pedagogical graph and defining the graph parameters used for the algorithms is also a crucial part of the role of the team for the project. One of the main goal of the team here is to transfer technologies developed in the team in a project with the perspective of industrial scaling and see the impact and the feasibility of such scaling.
Few digital interventions targeting numeracy skills have been evaluated with individuals with autism spectrum disorder (ASD) 154153. Yet, some children and adolescents with ASD have learning difficulties and/or a significant academic delay in mathematics. While ITS are successfully developed for typically developed students to personalize learning curriculum and then to foster the motivation-learning coupling, they are not or fewly proposed today to student with specific needs. The objective of this pilot study is to test the feasibility of a digital intervention using an STI with high school students with ASD and/or intellectual disability. This application (KidLearn) provides calculation training through currency exchange activities, with a dynamic exercise sequence selection algorithm (ZPDES). 24 students with ASD and/or DI enrolled in specialized classrooms were recruited and divided into two groups: 14 students used the KidLearn application, and 10 students received a control application. Pre-post evaluations show that students using KidLearn improved their calculation performance, and had a higher level of motivation at the end of the intervention than the control group. These results encourage the use of an STI with students with specific needs to teach numeracy skills, but need to be replicated on a larger scale. Suggestions for adjusting the interface and teaching method are suggested to improve the impact of the application on students with autism. 151.
Because of its cross-cutting nature to all cognitive activities such as learning tasks, attention is a hallmark of good cognitive health throughout life and more particularly in the current context of societal crisis of attention. Recent works have shown the great potential of computerized attention training for an example of attention training, with efficient training transfers to other cognitive activities, and this, over a wide spectrum of individuals (children, elderly, individuals with cognitive pathology such as Attention Deficit and Hyperactivity Disorders). Despite this promising result, a major hurdle is challenging: the high inter-individual variability in responding to such interventions. Some individuals are good responders (significant improvement) to the intervention, others respond variably, and finally some respond poorly, not at all, or occasionally. A central limitation of computerized attention training systems is that the training sequences operate in a linear, non-personalized manner: difficulty increases in the same way and along the same dimensions for all subjects. However, different subjects require in principle a progression at a different, personalized pace according to the different dimensions that characterize attentional training exercises.
To tackle the issue of inter-individual variability, the present project proposes to apply some principles from intelligent tutoring systems (ITS) to the field of attention training. In this context, we have already developed automatic curriculum learning algorithms such as those developed in the KidLearn project, which allow to customize the learner's path according to his/her progress and thus optimize his/her learning trajectory while stimulating his/her motivation by the progress made. ITS are widely identified in intervention research as a successful way to address the challenge of personalization, but no studies to date have actually been conducted for attention training. Thus, whether ITS, and in particular personalization algorithms, can optimize the number of respondents to an attention training program remains an open question.
To investigate this question, we first conducted a systematic review aiming at exploring existing methods in computerized CT and analyzing their outcomes in terms of learning mechanics (intra-training performance) and effectiveness (near, far and everyday life transfer effects of CT) 72. A search up to June 2023 with multiple databases selecting 19 computerized CT studies revealed that only two studies emphasized the favorable influence of individualization on CT effectiveness, while five underscored its capacity to enhance the training experience by boosting motivation, engagement, and offering diverse learning pathways. In sum, despite promising results in this new research avenue, more research is needed to fully understand and empirically support individualized techniques in cognitive training. Complementing the study of adaptive methods applied to cognitive training, we have attempted through a review of the subjective literature to gain a better understanding of the Multiple Object Tracking (MOT) task, which seems to have the best results in terms of attentional training efficiency in young and older adults. The results of this work highlight that: (1) Multiple cognitive mechanisms are identified as active in the task (divided and sustained attention; foveal and peripheric attention ; automatic and controlled inhibition, etc. ); (2) a limited number of studies have actually implemented the MOT task in computer-assisted cognitive training; and (3) tIt's the near (attention tasks) and far (other cognitive tasks) effects that are well documented as positive outcomes of MOT-based training while there is a scarcity of research that has thoroughly analyzed the ecological effects of attentional training, namely the potential transfer effects in everyday life (paper in progress).
In parallel to this, a web platform has been designed for planning and implementing remote behavioural studies. This tool provides means for registering recruited participants remotely and executing complete experimental protocols: from presenting instructions and obtaining informed consents, to administering behavioural tasks and questionnaires, potentially throughout multiple sessions spanning days or weeks. In addition to this platform, a cognitive test battery composed of seven classical behavioural tasks has been developed. This battery aims to evaluate the evolution of the cognitive performance of participants before and after training. Fully open-source, it mainly targets attention and memory. A preliminary study on a large sample of 50 healthy participants showed that the developed tasks reproduced the results of previous studies, that there were large differences between individuals (no ceiling effect) and that the results were significantly reliable between two measurements taken on two days separated by one night 2.
Utilizing these tools, a pilot study campaign was conducted to evaluate the impact of our AI-based personalized cognitive training program. The first pilot experiment involved n=27 participants and aimed to compare the effectiveness of a cognitive training program using a linear difficulty management procedure (staircase procedure) to a program using an ITS for difficulty manipulation. The online training lasted for 10 hours over a period of 2 weeks. The results indicated that the ITS-based intervention produced diverse learning trajectories compared to the linear procedure 35, leading to broader improvements in pre-post cognitive assessment. However, no significant differences were observed in subjective measures of motivation and engagement between the two groups. Subsequent to this initial experiment, two pilot studies (n=11 and n=10, respectively) were conducted with the goal of enhancing motivation and engagement in the game. The first study implemented gamified components such as scores and feedback, while the second study examined hyperparameter updates to the ITS. The analysis of learning trajectories, learning outcomes, and subjective measures yielded promising results in favor of the AI-based personalized procedure.
Building on the preliminary findings, we expanded our research scope with a more comprehensive experimental setup involving two distinct studies. The first study encompassed 64 young adults, sourced through the Prolific platform, while the second study consisted of 49 older adults, recruited from the "Université du temps libre". Our experimental methodology mirrored that of our initial pilot studies, with a notable enhancement: the integration of new gamified elements (including mini-story creation and new visual content) aimed at boosting participant motivation and engagement.
The data analysis encompassed three primary dimensions: initially, an exploratory phase to delineate learning trajectories between control and intervention groups; subsequently, a comparative analysis of pre- and post-test performance on the cognitive battery; and lastly, an examination of participants' self-reported experiences during training, providing insights into their subjective perceptions of the experiment.
The pilot studies' preliminary outcomes were corroborated in these larger sample groups. Notably, learning trajectories exhibited greater diversity in the group undergoing the intervention procedure. This group also demonstrated a more pronounced improvement across a wider range of cognitive assessment tasks. Although participants engaging in the personalized cognitive training reported a higher cognitive load via questionnaires, the levels of engagement and frustration did not significantly differ between the two groups.
As it is well known that there are more dropouts in older adults compared to young ones, we aimed to better understand the learning experience of trainees with feeback analyses. For this, we designed a new way throught several Large Language Models (LLM) enabling to extract hot topics or main dropout's motivations in verbatim that are related to pragmatic, hedonist and/or aesthetic dimensions of cogntive training . The results analyzed through various LLM are encouraging (paper in progress). To support this new approach, we are exploring different prompts on other data corpora in order to ultimately propose a tutorial accessible to anyone wishing to carry out a LLM-based thematic qualitative analysis.
Sustain and support the follow-up of the school inclusion of children with neurodevelopmental disorders (e.g., autism, attention disorders, intellectual deficiencies) has become an emergency : the higher is the school level, the lower is the amount of schooled pupils with cognitive disabilities.
Technology-based interventions to improve school inclusion of children with neurodevelopmental disorders have mostly been individual centered, focusing on their socio-adaptive, and cognitive impairments and implying they have to adapt themselves in order to fit in our society's expectations. Although this approach centered on the normalization of the person has some advantages (reduction of clinical, symptoms), it carries social stereotypes and misconceptions of cognitive disability that are not respectful of the cognitive diversity and intrinsic motivations of the person, and in particular of the student's wishes in terms of school curriculum to achieve his or her future life project 51.
The "ToGather" project aims at enlightening the field of educational technologies for special education by proposing an approach centered on the educational needs of the students and bringing a concerted and informed answer between all the stakeholders including the student and all their support spheres (family, school, medico-social care). To this end, ToGather project that emanates from participatory design methods, primarily consists of having developed a pragmatic tool (interactive website) to help students with cognitive disability and their caregivers to formalize and to visualize the repertoire of academic skills of the student and to make it evolve according to his or her proximal zone of development (in the sense of Vygotsky) on the one hand, and to the intrinsic motivations of the student (his or her own educational and life project) on the other 152.
This project is in partnership with the School Academy of Bordeaux of the French Education Minestery, the ARI association, the Centre of Autism of Aquitaine. It is funded by the FIRAH (foundation) and the Nouvelle-Aquitaine Region (see the dedicated webpages : https://flowers.inria.fr/projet-tous-ensemble/).
First, usability studies have been conducted for evaluating ergonomic qualities of the ToGather website, yielding positive resultats in French and Belgian contexts. Then, we conducted a large field-study to assess the effectiveness of the tool in helping stakeholders to support children with neurodevelopmental disorders (NDD) 180 83 55.
The study protocol consisted in a longitudinal non-randomized controlled trial, with baseline, 3-months, and 6-months fllow-up assessments. The recruitment was conducted across the entire French territory. Our local partners facilitated the dissemination of the call for participation in Gironde and provided us with contacts to extend it to other regions. Additionally, a recruitment campaign through social media was carried out to communicate about the study and encourage participants to test the ToGather tool.
As the tool was designed to support co-educational process between parents and professionals, a support team had to consist of at least two stakeholders, including at least one of the parents. Initially, 157 participants were recruited in 37 support teams, but 30 individuals did not answer to baseline questionnaire, leading to the exclusion of 11 support teams. After baseline assessment, 13 support teams were allocated to the experimental condition (ToGather app) and 11 to the control condition (usual follow-up).
Primary outcomes measures covered stakeholders’ relationships, self-efficacy, and attitudes towards inclusive education, while secondary outcomes measures were related to stakeholders’ burden and quality of life, as well as children’s school well-being and quality of life.
As the study ended in July 2023, data analysis is still ongoing. Preliminary results after 3 months of use showed encouraging results with an improvement in communication between stakeholders and their respective quality of life (paper in progress)
With the ever-increasing interest in digital technologies in education, many questions about their use and effectiveness are emerging. In this context, this project focus on the relationships between three key dimensions of technology-mediated learning: the learner's internal learning processes, the instructional design, and the educational technology used.
In partnership with CATIE (industrial partner) and the EPSYLON laboratory of the University of Montpellier (PR. André Tricot), two main objectives are targeted in this research program started in April 2022:
To this end, the program includes 3 main phases of study:
A systematic review evaluating the contributions and limitations of Virtual Reality (VR) and Augmented Reality (AR) in learning, with a specific focus on examining their impacts on cognitive load and intrinsic motivations, has been completed and is currently in the submission process 177.
The main results are as follows: From a pool of 3250 results, 36 studies with a robust study design investigating the impact of virtual or augmented reality on learning performance and cognitive load, or intrinsic motivation were incorporated. Main results of studies were reported in a grid that we built to determine whether the observed effects were positive, neutral, negative, or inconsistent with established theoretical frameworks. Results of the review indicate that AR effectively optimized cognitive load, leading to enhanced learning outcomes, while VR, on the other hand, tended to overload learners, decreasing learning performance. Regarding intrinsic motivation, results were incoherent with motivational models, likely due to variations in measurement methods. Notably, only a few studies simultaneously investigated cognitive load and intrinsic motivation as integral components of learning efficiency, and they reported conflicting causal relationships between these variables.
Based on these results, two experiments involving 140 second-year undergraduate students in the field of medicine were conducted. In the first experiment, we use spatial augmented reality and mixed reality (HoloLens 2) to investigate whether guiding students' drawings during their lectures can reduce their cognitive load and enhance their motivation to learn.
Our hypotheses are as follows:
In the second experiment, we change the learning paradigm, using virtual reality with different levels of interaction and guidance to examine how exploration and embodied interaction with a 3D model can have a positive impact on learning, cognitive load, and curiosity.
We make the following assumptions:
We hope to extend the results obtained to the industrial context in which CATIE's activities are carried out. CATIE's mission is to accelerate technology transfer between the worlds of research and industry. The Human Centered Systems team, in which this research project is part of, supports companies in improving the design of existing or new digital systems, by proposing a human-centered approach. The different questions raised by this project are intended to help CATIE to answer these issues, to improve its know-how in terms of learning and digital systems, and then to transfer this knowledge to EdTech companies.
We further developed our Automated Discovery software and started experimenting with it. First, we released a new version of our standalone Python library. We improved how experiments could be saved and reloaded.
Second, we focused on implementing tools and interfaces for users to give feedback or instructions to the automated discovery algorithm that explores the complex system. As identified by 16, empowering experiments to collaborate with automated discovery methods can be key to obtain interesting discoveries. Integrating such a collaborative process in our tool came with several engineering challenges and we are currently experimenting with our solution and working on making it user-friendly for non-experts end users.
Finally, we released a first open version of our software. We provide a documentation and installation tools using Docker.
As a continuation of the previous projects in Automated Discovery in Self-Organizing Systems, we have been working on expanding the set of discoveries of possible structures in continuous CAs such as Lenia 105, 104, and in particular we have been interested to search for emerging agents with sensorimotor capabilities. Understanding what has led to the emergence of life and sensorimotor agency as we observe in living organisms is a fundamental question. In our work, we initially only assume environments made of low-level elements of matter (called atoms, molecules or cells) locally interacting via physics-like rules. There is no predefined notion of agent embodiment and yet we aim to answer the following scientific question: is it possible to find environments in which there exists/emerge a subpart that could be called a sensorimotor agent?
We use Lenia continuous cellular automaton as our artificial "world" 104. We introduce a novel method based on gradient descent and curriculum learning combined within an intrinsically-motivated goal exploration process (IMGEP) to automatically search parameters of the CA rule that can self-organize spatially localized 2 and moving patterns 3 within Lenia. The IMGEP defines an outer exploratory loop (generation of training goal/loss) and an inner optimization loop (goal-conditioned). We use a population-based version of IMGEP 17, 114 but introduce two novel elements compared to previous papers in the IMGEP literature. First, whereas previous work in 29 and 16 used a very basic nearest-neighbor goal-achievement strategy, our work relies on gradient descent for the local optimization of the (sensitive) parameters of the complex system, which has shown to be very powerful. To do so we made a differentiable version of the Lenia framework, which is also a contribution of this work. Secondly, we propose to control subparts of the environmental dynamics with functional constraints (through predefined channels and kernels in Lenia) to build a curriculum of tasks; and to integrate this stochasticity in the inner optimization loop. This has shown central to train the system to emerge sensorimotor agents that are robust to stochastic perturbations in the environment. In particular, we focus on modeling obstacles in the environment physics and propose to probe the agent sensorimotor capability as its performance to move forward under a variety of obstacle configurations. We also provide in this work tests and metrics to measure the robustness of the obtained agents.
While many complex behaviors have already been observed in Lenia, among which some could qualify as sensorimotor behaviors, they have so far been discovered "by chance" as the result of time-consuming manual search or with simple evolutionary algorithms. Our method provides a more systematic way to automatically learn the CA rules leading to the emergence of basic sensorimotor structures, as shown in Figure 41. Moreover, we investigated and provided ways to measure the (zero-shot) generalization of the discovered sensorimotor agents to several out-of-distribution perturbations that were not encountered during training. Impressively, even though the agents still fail to preserve their integrity in certain configurations, they show very strong robustness to most of the tested variations. The agents are able to navigate in unseen and harder environmental configurations while self-maintaining their individuality (Figure 39). Not only the agents are able to recover their individuality when subjected to external perturbations but also when subjected to internal perturbations: they resist variations of the morphogenetic processes such that less frequent cell updates, quite drastic changes of scales as well as changes of initialization (Figure 40). Furthermore, when tested in a multi-entity initialization and despite hav,ing been trained alone, not only the agents are able to preserve their individuality but they show forms of coordinated interactions (attractiveness and reproduction). Our results sug,gest that, contrary to the (still predominant) mechanistic view on embodiment, biologically-inspired embodiment could pave the way toward agents with strong coherence and generalization to out-of-distribution changes, mimicking the remarkable robustness of living systems to maintain specific functions despite environmental and body perturbations 139. Searching for rules at the cell-level in order to give rise to higher-level cognitive processes at the level of the organism and at the level of the group of organisms opens many exciting opportunities to the development of embodied approaches in AI in general.
The work has been released in 2022 as a distill-like article which is currently hosted at this link. This article contains an interactive demo in webGL and javascript, as well as many videos and animations of the results. A colab notebook with the source code of the work is publicly available at.
In 2023, additional quantitative experiments were conducted as well as ablations. And this work was submitted to the Proceedings of the National Academy of Sciences (PNAS) journal.
Following our work on trying to find sensorimotor cabapabilities in cellular automata such as Lenia 105, 104, we kept exploring the search for low level cognition in continuous cellular automata. This led to preliminary search on trying to emerge memory in self-organizing agents as well as work on trying to implement other environmental constraints in the CA in order to emerge interesting behavior. To implement more easily those environmental constraints as well as to ease the emergence of spatially localized patterns (and thus have the optimization/search to focus more on the cognitive ability, removing the need to optimize to prevent uncontrollable growth/explosion of the pattern), we worked on adding mass conservation to the Lenia system.
We propose in this work a mass-conservative (i.e the sum of the CA’s activations remains constant over time) extension to Lenia called Flow Lenia 54. We hypothesize that such conservation laws will help in the search for artificial life-forms by constraining emerging patterns to spatially localized ones. It ,also allows to implement more easily environmental constraints on the self-organizing agents such as a need for food to grow, etc.
Furthermore, we show that this new model allows for the integration of the update rule parameters within the CA dynamics enabling the emergence of creatures with different parameters and so different properties in the same environment/grid. This leads to multi-species simulation where the grid is filled with agents with different behaviors and properties 42. Such a feature opens up research perspectives towards the achievement of open-ended intrinsic evolution inside continuous CAs, which means that all the evolutionary part would be a result of the dynamic of the CA (without any external loop/system). We hypothesize that this open-ended instrinsic evolution could, through the competition/cooperation, lead to the emergence of interesting low level cognition in those system.
Simple evolutionary strategy (with an evolutionary loop outside the system) was also used to optimized for pattern with directional and rotational movement.
You can find some examples of the system and pattern in this companion website, including the ones trained for movement, random parameters, food in flow Lenia, and multi species s,imulations: see. Notebook with the system can be found : here.
This work led to an oral presentation to the WIVACE 2022, 15th International Workshop on Artificial Life and Evolutionary Computation.
In 2023, final quantitative experiments on optimizing the parameters with evolutionary strategies and writing was conducted, as well as some additional exploratory experiments on large simulations for open ended evolution.
This work got the best paper award at the ALIFE 2023 conference where it got an oral presentation.
In the context of the project Automated Discovery in Self-Organizing Systems, we have an ongoing collaboration with Bert Chan, a previously independant researcher on Artificial Life and author of the Lenia system 105, 104 and who is now working as a research engineer at Google Brain. During this collaboration, Bert Chan help us design versions of IMGEP usable by scientists (non ML-experts) end-users, which is the aim of project 8.6.1. Having himself created the Lenia system, he is highly-interested to use our algorithms to automatically explore the space of possible emerging structures and provides us valuable insights into end-user habits and concerns. Bert Chan also co-supervised with Mayalen Etcheverry the master internship of Gautier Hamon which led to the work described in section 8.6.2. He also co-supervised with Gautier Hamon and Mayalen Etcheverry the master internship of Erwan plantec which led to the work described in section 8.6.3.
In the context of project "Automated Discovery in Self-Organizing Systems",
it has been demonstrated that modern tools leveraging computational models of curiosity developed in the Flowers team can be transposed to form efficient AI-driven "discovery assistants." These tools can assist scientists in mapping and navigating the space of possible outcomes in complex systems 29, 16, 128. In 2022, we initiated a collaboration with Dr. Michael Levin, a renowned biologist at Tufts University, through a 5-month academic exchange with Mayalen Etcheverry in his lab in Boston. This collaboration laid the foundation for continued collaboration throughout 2023, resulting in the submission of one paper 73 (currently under review) and another accepted at the NeurIPS 2023 AI for Science workshop 61.
The primary focus of this collaboration was to leverage curiosity-driven exploration algorithms as tools to empower scientific exploration and analysis of basal cognition in biological systems, specifically numerical models of gene regulatory networks (GRNs). Understanding, mapping, predicting, and controlling the complex behavior of these networks is crucial for applications in biomedicine and synthetic bioengineering. However, there are few quantitative tools that facilitate exploration of these networks, especially when their complexity makes unguided exploration infeasible.
To address these challenges in practice, we proposed an experimental framework summarized in Figure 43. In this framework, we formalized and investigated a view of gene regulatory networks as agents navigating a problem space. We developed automated tools to efficiently map the repertoire of robust goal states that GRNs can reach despite perturbations. These tools rely on two main contributions that we made in this work: (1) The use of curiosity-driven exploration algorithms, originating from the AI community, to explore the range of behavioral abilities of a given system, which we adapted and leveraged to automatically discover the range of reachable goal states of GRNs, and (2) The use of a battery of empirical tests inspired by implementation-agnostic behaviorist approaches that we leveraged to assess the navigation competencies of GRNs.
Our data revealed that models inferred from real biological data can reach a surprisingly wide spectrum of steady states, showcasing various competencies that living agents often exhibit in physiological network dynamics and that do not require structural changes to network properties or connectivity. Furthermore, we investigated the applicability of the discovered “behavioral catalogs” for comparing the evolved competencies across classes of evolved biological networks, as well as for the design of drug interventions in biomedical contexts or for the design of synthetic gene networks in bioengineering. Altogether, these automated tools and the resulting emphasis on behavior-shaping and exploitation of innate competencies can open the path to better interrogation platforms for exploring the complex behavior of biological networks in an efficient and cost-effective manner.
To encourage broader adoption and development of the tools and algorithms, we have released two software packages: SBMLtoODEJax (https://github.com/flowersteam/sbmltoodejax) 7.1.21 and AutoDiscJax (https://github.com/flowersteam/autodiscjax). SBMLtoODEJax converts Systems Biology Markup Language (SBML) models into Python classes written in JAX, enabling easy simulation and manipulation. AutoDiscJax, built upon JAX and SBMLtoODEJax, facilitates automated discovery and exploration of complex systems, specifically organizing the exploration of computational models of biological GRNs.
Financing of the PhD grant of Laetitia Teodorescu.
Financing of the CIFRE PhD grant of Mayalen Etcheverry by Poietis.
Financing of the CIFRE PhD grant of Maxime Adolphe by Onepoint.
Financing of the CIFRE PhD grant of Rania Abdelghani by EvidenceB.
Financing of a PhD grant of Matisse Poupard with CATIE and EPSYLON Lab (Univ. Montpellier).
Financing of the PhD grant of Clément Romac by Hugging Face.
We developed planning algorithms for a autonomous electric car for Renault SAS in the continuation of the previous ADCC project. We improved our planning algorithm in order to go toward navigation on open roads, in particular with the ability to reach higher speed than previously possible, deal with more road intersection case (roundabouts), and with multiple lane roads (overtake, insertion...).
We received a 30keuros grant from Google Brain, as well as 30keuros Google cloud credit, for developing projects on automated exploration of continuous cellular automata.
Financing of one year-postdoctoral position and the app. development by the International Foundation for Applied Research on Disability (FIRAH). The School+ project consists of a set of educational technologies to promote inclusion for children with Autism Spectrum Disorder (ASD). School+ primary aims at encouraging the acquisition of socio-adaptive behaviours at school while promoting self-determination (intrinsic motivation), and has been created according to the methods of the User-Centered Design (UCD). Requested by the stakeholders (child, parent, teachers, and clinicians) of school inclusion, Flowers team works to the adding of an interactive tool for a collaborative and shared monitoring of school inclusion of each child with ASD. This new app will be assessed in terms of user experience (usability and elicited intrinsic motivation), self-efficacy of each stakeholder and educational benefit for child. This project includes the Academie de Bordeaux –Nouvelle Aquitaine, the CRA (Health Center for ASD in Aquitania), and the ARI association.
The project "Cohorte LongitudinalE sur la Myopie et le développement oculaire dans l’ENfanCE(CLEMENCE) is led by C. Delcourt from the lab of Bordeaux Populational Health (2M€). Hélène Sauzéon and Cécile Mazon participate to the research program with the study of developemental changes due to Myopa in visual attention.
Clément Moulin-Frier is continuing an active scientific and teaching collaboration with Ricard Solé and Marti Sanchez-Fibla from the University Pompeu-Fabra (UPF) in Barcelona, Spain. The main highlights from this collaboration for 2023 are:
We created a new three years- Inria associate team since april 2023, namely CuriousTECH team (see the website of CuriousTECH team : https://flowers.inria.fr/curioustech-associate-team/). Its entails two team of Inria (Flowers and Potioc) and two labs of the Waterloo University (HCI Lab, David R. Cheriton School of Computer Science and the Cognitive neuroscience lab. of Pyschology Department). This prosed associate team aims to develop an original, cross-disciplinary approach, joining together two perspectives:
1) The fundamental study of curiosity-driven learning across life-span (in children, young adults and older adults) and
2) The study of how new (re)educational technologies, using both curiosity-related models and artificial intelligence techniques [3, 8, 9], can personalize learning sequences for each individual, maximizing curiosity and learning efficiency in real world contexts.
Our proposed research will produce new understanding of the role of curiosity in education and healthy aging, through the design and the field assessment of new interactive educational technologies or health-related technologies. Beyond academic contributions, we expect our findings to inform the broader societal challenges inherent to the School of the 21st Century, ranging from helping children (and their teachers) to develop cross-domain skills for learning such as curiosity and meta-cognition, while improving inclusivity in schools (learners with disabilities, especially cognitive disabilities) as well as promoting lifelong learning in older adults (successful aging), using cognitive-based research findings.
Another outcome of our joint program is to use applied research to accelerate the transfer of results to industries and public institutions related to education and healthy aging in both countries. The mixed method approach used in our proposed project (user-centered methods, digital technologies, artificial intelligence, and field assessment) will help demonstrate the effectiveness of our developed technology, and facilitate adoption by industry partners and market stakeholders from various education and health care organizations.
INTERACT project on cordis.europa.eu
- PY Oudeyer continued to work on the research program of this Chaire, funding 2 PhDs and 3 postdocs for five years (until 2025).
- C. Moulin-Frier obtained an ANR JCJC grant. The project is entitled "ECOCURL: Emergent communication through curiosity-driven multi-agent reinforcement learning". The project starts in Feb 2021 for a duration of 48 months. It will fund a PhD student (36 months) and a Research Engineer (18 months) as well as 4 Master internships (one per year).
Pierre-Yves Oudeyer and Clément Moulin-Frier obtained a grant from the call for project AIRSTRIP "L'intelligence Artificielle au service de l'IngénieRie des SysTèmes aéRonautIques et sPatiaux", in collaboration with the IRT Saint Exupery. The project was accepted in 2023 and will fund 18 months of a research engineer position starting in 2024.
- Didier Roy is collaborator of the Inria Exploratory Action AIDE "Artificial Intelligence Devoted to Education", ported by Frédéric Alexandre (Inria Mnemosyne Project-Team), Margarida Romero (LINE Lab) and Thierry Viéville (Inria Mnemosyne Project-Team, LINE Lab). The aim of this Exploratory Action consists to explore to what extent approaches or methods from cognitive neuroscience, linked to machine learning and knowledge representation, could help to better formalize human learning as studied in educational sciences. AIDE is a four year project started middle 2020 until 2024 see.
- Hélène Sauzéon is co-PI with P. Dragicevic of the Inria Exploratory Action I'AM "Impact of Augmented Reality on Autobiographical Memory: Examining Involuntary Memories and False Memories" (174,5k€). Starting in last september, the aim of this Exploratory Action consists to explore to what extent augmented reality based devices can produce erroneous autobiographical memories, and more particularly in vulnerable people (Children and older adults or yound adults with low memory abilities of source monitoring).
for the co-direction of the PhD thesis of Jeremy Perez with Clément Moulin-Frier and Pierre-Yves Oudeyer on "Interactions between intrinsically motivated goal-exploration processes and cummulative cultural evolution" (see section 8.2.3).
- Hélène Sauzéon and AS Rigaud will supervize the WP5 dedicated to two care-led innovation experiments with assistive technologies (400k € for Bordeaux).
- Hélène Sauzéon will supervize the WP4.3 dedicated to "Explore Digital Therapeutics To Slow Down Cognitive Decline In Covert Csvd" (150k€)
The solution Adaptiv'Math comes from an innovation partnership for the development of a pedagogical assistant based on artificial intelligence. This partnership is realized in the context of a call for projects from the Ministry of Education to develop a pedagogical plateform to propose and manage mathematical activities intended for teachers and students of cycle 2. The role of Flowers team is to work on the AI of the proposed solution to personalize the pedagogical content to each student. This contribution is based on the work done during the Kidlearn Project and the thesis of Benjamin Clement 108, in which algorithms have been developed to manage and personalize sequence of pedagogical activities. One of the main goal of the team here is to transfer technologies developed in the team in a project with the perspective of industrial scaling.
PY Oudeyer was member of the editorial board of: IEEE Transactions on Cognitive and Developmental Systems and Frontiers in Neurorobotics.
Clément Moulin-Frier gave an invited talk at the seminar Marge, exception, déviation at Université Bordeaux Montaigne on February 24th, 2023. Title of the talk: Écologie de la cognition humaine.
Clément Moulin-Frier gave an invited talk at the "2nd réunion scientifique de la Société Psychédélique Aquitaine" at the Hopital Saint André in Bordeaux on October 17th, 2023. Title of the talk: The Ecology of Open-Ended Skill Acquisition: Eco-evolutionary, developmental and socio-cultural perspectives.
Clément Moulin-Frier gave an invited talk at the symposium Intelligence: natural,artificial and synthetic at the Barcelona Collaboratorium on October 5th, 2023. Title of the talk: Promoting behavioral diversity in artificial agents through eco-evolutionary and socio-cultural dynamics.
Clément Moulin-Frier gave an invited talk at the seminar of the GIPSA-Lab (Grenoble, France) on December 15th, 2023. Title of the talk: Modelling the eco-evolutionary, developmental and socio-cultural origins of open-ended skill acquisition.
Cédric Colas gave an invited talk at the ENS in June 2023 during a Lab meeting of Stefano Palminteri. Title of the talk: Towards Social Autotelic Agents.
Cédric Colas gave an invited talk at ICRA in the Life-Long Learning with Human Help Workshop, in July 2023. Title of the talk: Towards Social Autotelic Agents.
Cédric Colas gave an invited talk at Brown University in the USA, during Lab meeting of George Konidaris, in September 2023. Title of the talk: Towards Social Autotelic Agents.
Thomas Carta gave an invited talk online during a reading group with Glen Berseth on RL focused on effective generalization and pre-training strategies for control in MILA and ServiceNow Research, in April 2023. Title of the talk: Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning.
Thomas Carta gave an invited talk online at Naver Labs Europe, in May 2023. Title of the talk: Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning.
Clément Romac gave an invited talk at the COLT team'seminar at Universitat Pompeu Fabra (Barcelona) in May 2023. Title of the talk: Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning.
Hélène Sauzéon gave 4 invited talks :
Cécile Mazon gave an invited talk for Inria Disability conferences cycle (cycles de conférences Handicap) with Cathy Hémon (Autism Resource center, trainer and specialized teacher) on Autism Spectrum Disorders, coeducation, and participative methods to address field issues
PY Oudeyer gave several invited presentations:
Helene Sauzéon is the proxis of the Inria center of the University of Bordeaux réseau RT (GDR) CNRS Education since july 2022
Cédric Colas was member of the board of the IMOL community https://www.imol-community.org/community/
- Clément Moulin-Frier is on active discussion with the start-up Pontos in view of a future collaboration in 2024.
- Clément Moulin-Frier was a member of the CRCN/ISFP Jury of Inria Bordeaux on May 9th, 2023.
- Helene Sauzeon was member of the selection committe of DR2 09 (Univ. G. Eiffel) in 2023
- Helene Sauzéon was Member of the selection committe of PhD of MSCA COFUND SOUND.AI program (European Prog. at Sorbonne Univ) since 2023
- Hélène Sauzéon is member of the ANR committee - CES 38 (Interdisciplinary research section)- since November 2023.
- Helene Sauzéon was member of scientific committee of MAVIE-II -Calyxis link.
- PY Oudeyer was a member of the jury selecting grants for PhDs in AI in the context of SoundAI project at Sorbonne University
- PY Oudeyer reviewed projects for the European Commission (ERC, Marie Curie grants, EU Pathfinder), for the US/Israel Binational Science Foundation, for ANR, for RIF Cyprus, Leverhulme Trust.
• Hélène Sauzéon and Cécile Mazon are members of directory committee of LILLAB (https://¬www.¬lillabneurodev.¬fr/) which is a living and learning lab funded by the “délégation interministérielle à la stratégie nationale à l’autisme et troubles neurodéveloppementaux” and aiming the dissemination of knowledge in connection with the 3 centers of excellence for autism and Neurodevelopmental syndromes; since 2020.
• Hélène Sauzéon is member of directory committee of IFHR / FEDRAH (https://¬ifr-handicap.-inserm.¬fr/) which is a national institute on disability funded by Inserm aiming the researcher networking and dissemination of knowledge on multidisciplinary research on disability; since 2018.
• Hélène Sauzéon is the head of the Innovations and Transfer Committee of the BIND Center of Excellence link in Bordeaux and member of the BIND directory Committee since 2018
• Cécile Mazon is co-responsible of the WP digital tools of the PIA AtypieFriendly (ex - AspieFriendly)
• Pierre-Yves Oudeyer is head of the Flowers project-team, Inria/Univ. Bordeaux/ENSTA ParisTech
Teaching Responsibilities:
Teaching Involvement in Computer / Engineer science or in cognitive science:
• PhD defended: Tristan Karch (defended in 2023), "Language acquisition in curiosity-driven Deep RL", beg. in sept. 2019 (supervisors: PY. Oudeyer and C. Moulin-Frier)
• PhD defended: Mayalen Etcheverry (defended in 2023), "Automated discovery with intrinsically motivated goal exploration processes", beg. in sept. 2020 (supervisors: PY. Oudeyer and C. Moulin-Frier)
• PhD defended: Laetitia Teodorescu, (defended in 2023), "Graph Neural Networks in Curiosity-driven Exploring Agents", beg. in sept. 2020 (supervisors: PY. Oudeyer and K. Hoffman)
• PhD in progress: J Grgur Kovac (in progress), "Developmental training of socio-cognitive abilities in AI systems", (supervisors:PF. Dominey and PY. Oudeyer)
• PhD in progress: J Julien Pourcel (in progress), "Autotelic LLMs that learn how to code", (supervisors: C. Moulin-Frier and PY. Oudeyer)
• PhD in progress: J Thomas Carta (in progress), "LLM-based Autotelic deep reinforcement learning agents", (supervisors: O. Sigaud, S. Lamprier and PY. Oudeyer)
• PhD in progress: J Clément Romac (in progress), "Grounding LLMs with online RL", (supervisors: T. Wolf and PY. Oudeyer)
• PhD in progress: Jeremy Perez, supervized by Clément Moulin-Frier, which started in October 2023
• PhD in progress : Chloé Desvaux "Design and experiment new metacognitive trainings for fostering curiosity and creativity among children in a school setting: a lever for intrinsically motivated learning? ",beg. in October. 2023 (supervised by H. Sauzéon and PY Oudeyer).
• PhD in progress : Leana Petiot " Study of Augmented reality on the functioning of Implicit autobiographical memory",beg. in October. 2023 (supervised by H. Sauzéon and P. Dragicevic from Potioc team).
• PhD in progress: Maxime Adolphe, "Adaptive personalization in attention training systems", beg. in sept. 2020 (supervisors: H. Sauzéon and PY. Oudeyer)
• PhD in progress: Rania Abdelgani, "Fostering curiosity and meta-cognitive skills in educational technologies", beg. in dec. 2020 (supervisors: H. Sauzéon and PY. Oudeyer).
• PhD in progress: Isabeau Saint-Supery, "Designing and Assessing a new interactive tool fostering stakeholders' cooperation for school inclusion", supervised by H. Sauzéon and C. Mazon.
• PhD in progress : Matisse Poupard "Optimize learning in a digital environment according to learners' level of expertise, epistemic curiosity and mode of instruction" ",beg. in Ap. 2022 (supervised by H. Sauzéon and A. Tricot from Univ. Montpellier).
Gautier Hamon and Clément Moulin-Frier supervised the Master internships of Richard Bornemann and of Corentin Léger (in collaboration with Xavier Hinaut from Mnemosyne) in 2023.
Maxime Adolphe and Hélène Sauzéon supervised the Master internships of Stéphanie Mortemousque in 2023.
Cécile Mazon and Isabeau Saint-Supery supervised the Master internship of Valentin Strahm in 2023.
Rania Abdelghani and Hélène Sauzéon supervised the Master internship of Chloé Desvaux in 2023.
Pierre Dragicevic and Hélène Sauzéon supervised the Master internship of Léana Petiot in 2023.
H. Sauzéon has been member of scientific board of HDR degree of P. Dragicevic
Clément Moulin-Frier was a member of the PHD jury of Joachim Winther Pedersen (thesis director: Sebastian Risi, Univeristy of Copenhagen).
H. Sauzéon has been president of PhD jury of Axelle Gelineau on " Projet RGS@HOME : Evaluation de l’acceptabilité d’un système de télé-réhabilitation membre supérieur basé sur la réalité virtuelle auprès des patients post-AVC. " at the university of Limoges, December, 8th 2023, Limoges.
H. Sauzéon has been reviewer in the PhD jury of Marine Saba on " Efficacité d'un programme d'entraînement des ressources attentionnelles et de la mémoire de travail chez les personnes âgées avec un trouble cognitif léger : Effets sur les fonctions cognitives et une situation écologique évaluée avec la réalité virtuelle " at the university of Paris, December, 15th 2023, Paris.
Clément Moulin-Frier was a member of the jury of the Premier hackathon consacré à l’enquête journalistique sur les algorithmes at IJBA (Bordeaux) on November 30th, 2023.
Clément Moulin-Frier is a member of the "comité de suivi de thèse" of Nathan Trouvain (Mnemosyne, Inria) and Camille Charrier (LPNC, Grenoble).
Hélène sauzéon is a member of the "comité de suivi de thèse" of Hugo Fournier (Lab Psychology, Bordeaux).
Hélène sauzéon is academic tutor for 2 PhD students of the Doctoral School SP2.
Cécile Mazon organized and was president of the jury of the defense of Cognitive Sciences Master students (M1 and M2)
PY Oudeyer was a reviewer in the HdR of Erik Gustaffsson (Univ. Bourgogne Franche Comté), and examiner in the PhD of Enrique Donancio (INSA Rouen Normandie) and Lina Mezghani (Univ. Grenoble).
PY Oudeyer was in the PhD "comité de suivi" of Marc Welter (Univ. Bordeaux), Jean-Baptiste Gaya (Université Paris-Sorbonne), Marie Martin (Univ. Paris Saclay), Elias Najarro (Univ. Copenhagen), Matthis Poupard (Univ. Bordeaux),
• Hélène Sauzéon was member of extented office of Project-team committee of the centre of Inria of university of Bordeaux.
• Hélène Sauzéon was student in the Inria MasterClass since sep. 2022.
• PY Oudeyer contributed several internal notes on AI in society, helping Inria direction answer several requests on this topic from governmental organizations.