Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 6;12(1):5721.
doi: 10.1038/s41467-021-25874-z.

Embodied intelligence via learning and evolution

Affiliations

Embodied intelligence via learning and evolution

Agrim Gupta et al. Nat Commun. .

Abstract

The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, because performing large-scale in silico experiments on evolution and learning is challenging. Here, we introduce Deep Evolutionary Reinforcement Learning (DERL): a computational framework which can evolve diverse agent morphologies to learn challenging locomotion and manipulation tasks in complex environments. Leveraging DERL we demonstrate several relations between environmental complexity, morphological intelligence and the learnability of control. First, environmental complexity fosters the evolution of morphological intelligence as quantified by the ability of a morphology to facilitate the learning of novel tasks. Second, we demonstrate a morphological Baldwin effect i.e., in our simulations evolution rapidly selects morphologies that learn faster, thereby enabling behaviors learned late in the lifetime of early ancestors to be expressed early in the descendants lifetime. Third, we suggest a mechanistic basis for the above relationships through the evolution of morphologies that are more physically stable and energy efficient, and can therefore facilitate learning and control.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. DERL overview.
a DERL is a general framework to make embodied agents via two interacting adaptive processes. An outer loop of evolution optimizes agent morphology via mutation operations, some of which are shown in (b) and an inner reinforcement learning loop optimizes the parameters of a neural controller (c). d Example agent morphologies in the UNIMAL design space. e Variable terrain consists of three stochastically generated obstacles: hills, steps, and rubble. In manipulation in variable terrain, an agent must start from an initial location (green sphere) and move a box to a goal location (red square).
Fig. 2
Fig. 2. Evolutionary dynamics in multiple environments.
a Mean and 95% bootstrapped confidence intervals of the fitness of the entire population across 3 evolutionary runs. b Each dot represents a lineage that survived to the end of one of 3 evolutionary runs. Dot size reflects the total number of beneficial mutations (see “Methods”) accrued by the lineage. The founder of a lineage need not have extremely high initial fitness rank in order for it’s lineage to comprise a reasonably high fraction of the final population. It can instead achieve population abundance by accruing many beneficial mutations starting from a lower rank (i.e., large dots that are high and to the left). ce Phylogenetic trees of a single evolutionary run where each dot represents a single UNIMAL, dot size reflects number of descendants, and dot opacity reflects fitness, with darker dots indicating higher fitness. These trees demonstrate that multiple lineages with descendants of high fitness can originate from founders with lower fitness (i.e., larger lighter dots). fh Muller diagrams showing relative population abundance over time (in the same evolutionary run as in (ce) of the top 10 lineages with the highest final population abundance. Each color denotes a different lineage and the opacity denotes its fitness. Stars denote successful mutations which changed agent topology (i.e., adding/deleting limbs) and resulted in a sub-lineage with more than 20 descendants. The abundance of the rest of the lineages is reflected by white space. ik Time-lapse images of agent policies in each of the three environments with boundary color corresponding to the lineages above. b Shown are the correlation coefficients (r) and P values obtained from two-tailed Pearson’s correlation.
Fig. 3
Fig. 3. Environmental complexity fosters morphological intelligence.
a Eight test tasks for evaluating morphological intelligence across 3 domains spanning stability, agility, and manipulation ability. Initial agent location is specified by a green sphere, and goal location by a red square (see “Methods” for detailed task descriptions). bd We pick the 10 best-performing morphologies across 3 evolutionary runs per environment. Each morphology is then trained from scratch for all 8 test tasks with 5 different random seeds. Bars indicate median reward (n = 50) (b, c) and cost of work (d) with error bars denoting 95% bootstrapped confidence intervals and color denoting evolutionary environment. b Across 7 test tasks, agents evolved in MVT perform better than agents evolved in FT. c With reduced learning iterations (5 million in (b) vs 1 million in (c)) MVT/VT agents perform significantly better across all tasks. d Agents evolved in MVT are more energy efficient as measured by lower cost of work despite no explicit evolutionary selection pressure favoring energy efficiency. Statistical significance was assessed using the two-tailed Mann–Whitney U Test; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.
Fig. 4
Fig. 4. A morphological Baldwin effect and its relationship to energy efficiency and stability.
a Progression of mean (n = 100) iterations to achieve the 75th percentile fitness of the initial population for the lineages of the best 100 agents in the final population across 3 evolutionary runs. b Fraction of stable morphologies (see “Methods”) averaged over 3 evolutionary runs per environment. This fraction is higher in VT and MVT than FT, indicating that these more complex environments yield an added selection pressure for stability. c Mean cost of work (see “Methods”) for same lineages as in (a). d Learning curves for different generations of an illustrative agent evolved in FT indicate that later generations not only perform better but also learn faster. Thus overall evolution simultaneously discovers morphologies that are more energy efficient (c), stable (b), and simplify control, leading to faster learning (a). Error bars (a, c) and shaded region (b) denote 95% bootstrapped confidence interval.
Fig. 5
Fig. 5. Relationship between energy efficiency, fitness, and learning speed.
a Correlation between fitness (reward at the end of lifetime learning) and cost of work for the top 100 agents across 3 evolutionary runs. b Correlation between learning speed (iterations required to achieve the 75th percentile fitness of the initial population same as Fig. 4a) and cost of work for same top 100 agents as in (a). Across all generations, morphologies which are more energy-efficient perform better (negative correlation) and learn faster (positive correlation). a, b Shown are the correlation coefficients (r) and P values obtained from two-tailed Pearson’s correlation.

Similar articles

Cited by

  • Artificial intelligence for geoscience: Progress, challenges, and perspectives.
    Zhao T, Wang S, Ouyang C, Chen M, Liu C, Zhang J, Yu L, Wang F, Xie Y, Li J, Wang F, Grunwald S, Wong BM, Zhang F, Qian Z, Xu Y, Yu C, Han W, Sun T, Shao Z, Qian T, Chen Z, Zeng J, Zhang H, Letu H, Zhang B, Wang L, Luo L, Shi C, Su H, Zhang H, Yin S, Huang N, Zhao W, Li N, Zheng C, Zhou Y, Huang C, Feng D, Xu Q, Wu Y, Hong D, Wang Z, Lin Y, Zhang T, Kumar P, Plaza A, Chanussot J, Zhang J, Shi J, Wang L. Zhao T, et al. Innovation (Camb). 2024 Aug 22;5(5):100691. doi: 10.1016/j.xinn.2024.100691. eCollection 2024 Sep 9. Innovation (Camb). 2024. PMID: 39285902 Free PMC article. Review.
  • Exploring Embodied Intelligence in Soft Robotics: A Review.
    Zhao Z, Wu Q, Wang J, Zhang B, Zhong C, Zhilenkov AA. Zhao Z, et al. Biomimetics (Basel). 2024 Apr 19;9(4):248. doi: 10.3390/biomimetics9040248. Biomimetics (Basel). 2024. PMID: 38667259 Free PMC article. Review.
  • Integration of cognitive tasks into artificial general intelligence test for large models.
    Qu Y, Wei C, Du P, Che W, Zhang C, Ouyang W, Bian Y, Xu F, Hu B, Du K, Wu H, Liu J, Liu Q. Qu Y, et al. iScience. 2024 Mar 22;27(4):109550. doi: 10.1016/j.isci.2024.109550. eCollection 2024 Apr 19. iScience. 2024. PMID: 38595796 Free PMC article. Review.
  • Body size as a metric for the affordable world.
    Feng X, Xu S, Li Y, Liu J. Feng X, et al. Elife. 2024 Mar 28;12:RP90583. doi: 10.7554/eLife.90583. Elife. 2024. PMID: 38547366 Free PMC article.
  • Enhancing robot evolution through Lamarckian principles.
    Luo J, Miras K, Tomczak J, Eiben AE. Luo J, et al. Sci Rep. 2023 Nov 30;13(1):21109. doi: 10.1038/s41598-023-48338-4. Sci Rep. 2023. PMID: 38036589 Free PMC article.

References

    1. Darwin, C. On the Origin of Species by Means of Natural Selection, Vol. 167 (John Murray, London, 1859).
    1. Evans SD, Hughes IV, Gehling JG, Droser ML. Discovery of the oldest bilaterian from the Ediacaran of south Australia. Proc. Natl Acad. Sci. USA. 2020;117:7845–7850. doi: 10.1073/pnas.2001045117. - DOI - PMC - PubMed
    1. Pfeifer, R. & Scheier, C. Understanding Intelligence (MIT Press, 2001).
    1. Brooks RA. New approaches to robotics. Science. 1991;253:1227–1232. doi: 10.1126/science.253.5025.1227. - DOI - PubMed
    1. Bongard J. Why morphology matters. Horiz. Evolut. Robot. 2014;6:125–152.

Publication types