Multi-regional control of amygdalar dynamics reliably reflects fear memory age

Makino, Yuichi; Wang, Yi; McHugh, Thomas J.

doi:10.1038/s41467-024-54273-3

Download PDF

Article
Open access
Published: 09 December 2024

Multi-regional control of amygdalar dynamics reliably reflects fear memory age

Yuichi MakinoÂ ORCID: orcid.org/0000-0002-9566-5069^1,2^Â na1,
Yi WangÂ ORCID: orcid.org/0000-0002-4210-7610^1,3^Â na1 &
Thomas J. McHughÂ ORCID: orcid.org/0000-0002-1243-5189¹Â

Nature Communications volumeÂ 15, ArticleÂ number:Â 10283 (2024) Cite this article

4349 Accesses
35 Altmetric
Metrics details

Subjects

Abstract

The basolateral amygdala (BLA) is crucial for the encoding and expression of fear memory, yet it remains unexplored how neural activity in this region is dynamically influenced by distributed circuits across the brain to facilitate expression of fear memory of different ages. Using longitudinal multisite electrophysiological recordings in male mice, we find that the recall of older contextual fear memory is accompanied by weaker, yet more rhythmic, BLA gamma activity which is distally entrained by theta oscillations in both the hippocampal CA1 and the anterior cingulate cortex. Computational modeling with Light Gradient Boosting Machine using extracted oscillatory features from these three regions, as well as with Transformer using raw local field potentials, accurately classified remote from recent memory recall primarily based on BLA gamma and CA1 theta. These results demonstrate in a non-biased manner that multi-regional control of BLA activity serves as reliable neural signatures for memory age-dependent recall mechanisms.

Neuronal types in the mouse amygdala and their transcriptional response to fear conditioning

Article Open access 26 October 2023

An inhibitory hippocampalâ€“thalamic pathway modulates remote memory retrieval

Article 29 March 2021

Persistent transcriptional programmes are associated with remote memory

Article 11 November 2020

Introduction

The BLA is involved in emotional information processing, particularly fear learning and memory¹. It has widely been reported that BLA is required for both newly acquired (recent) and older, long-lasting (remote) fear memory^2,3,4,5. These studies suggest a general participation of BLA in fear memory expression regardless of memory age, however it remains unexplored whether BLA utilizes identical or distinct mechanisms at the level of dynamic neural activity to facilitate recall at these differentÂ timepoints.

Importantly, there is accumulating evidence that the consolidation and long-term storage of memory requires multiple brain regions, such as the hippocampus and prefrontal cortex^6,7,8. Also, long-range dynamic neural interaction between these regions and the BLA has been reported, with its implicated roles in emotional memory processes^9,10. These observations raise the possibility that distant regions differentially entrain BLA to generate specific neural activity patterns underlying recent and remote fear memory recall, respectively. Further, if these multi-regional patterns of neural activity serve as reliable signatures for distinct forms of recall, they could be computationally analyzed to decode the age of memory. In particular, recent advancements in deep neural network modeling now may allow us to identify neural activity patterns that are most characteristic of distinct memory ages in a non-biased manner by using raw neural signal as the input. However, such decoding has yet to be implemented.

Here, we report that recall of recent and remote contextual fear memory exhibits distinct dynamic neural activity patterns within the BLA, as well as differential cross-regional influence of those patterns. Specifically, the intensity of BLA gamma activity and its phase-amplitude coupling to theta oscillations recorded in CA1 and the anterior cingulate cortex (ACC) characterize recent and remote recall, respectively. Further, both feature-based modeling by Light Gradient Boosting Machine (LightGBM¹¹) and feature-independent modeling by Transformer could classify recent from remote memory at a high accuracy, primarily relying on BLA gamma and CA1 theta as revealed by our method of interpreting model structure.

Results

Gamma activity in the BLA is enhanced during recent memory recall

To explore neural activity in the BLA and its entrainment by distal regions associated with different memory ages, we implanted a microdrive containing electrodes targeting the BLA, CA1 and ACC in freely moving mice. After recovery from surgery, mice were placed in a fear conditioning chamber and neural activity was recorded during a baseline session. Shortly after, mice were fear-conditioned in the same chamber with electrical shock, leading to the formation of a long-lasting contextual fear memory. Mice were returned to the same chamber 1â€‰day (recent recall) and 25â€“27â€‰days (remote recall) after conditioning, and memory was assessed by the level of freezing while neural activity was recorded (Fig.Â 1a). This longitudinal design was explicitly chosen to ensure equivalent recording positions and quality in these brain regions across time and to allow within-subject comparisons of the neurophysiology data, which could eliminate possible artefacts arising from individual variability with a cross-subject design^12,13. Further, we note that the raw electrophysiological recording data from CA1 and ACC, as well as the behavioral data used in this paper, are identical to those used in our previous publication¹⁴, where we reported that percentage of freezing, as well as locomotion and behavioral patterns during non-freezing periods were both comparable between the recent and remote time points (Fig.Â 1b and Supplementary Fig.Â 1).

**Fig. 1: BLA slow gamma activity is augmented during recent memory recall.**

It has been shown that oscillatory neural activity and synchrony across regions, particularly in the theta and gamma frequency ranges, play a critical role in various aspects of cognitive behavior^9,15,16. To probe potential activity differences in the BLA during recent and remote recall, we examined oscillatory power (power spectral density; PSD) of local field potential (LFP) in the BLA during the two recall sessions (Fig.Â 1c). The time course of BLA power exhibited intermittent transient peaks in the slow gamma range (30â€“50â€‰Hz), particularly during non-freezing periods shortly before the onset of freezing (Fig.Â 1d). Consistent with this, BLA power averaged across animals showed clear peaks in this frequency range during non-freezing periods of all sessionsÂ (Fig. 1e). Additionally, similar peaks were also observed during freezing periods, albeit in a slightly higher frequency range (Fig.Â 1e). This slow gamma power was significantly higher during the recent compared to the baseline and remote recall sessions during non-freezing periods (Fig.Â 1f), suggesting that heightened slow gamma activity is associated specifically with post-learning processes and selectively at this early time point. The increase of slow gamma power in the recent session remained significant even after excluding the influence of locomotion (Supplementary Fig.Â 2a), indicating that it was not caused by a difference in locomotor activity. Likewise, slow gamma power was higher during recent compared to remote session during freezing periods (Fig.Â 1f), implying its involvement in fear expression itself. In contrast, activity at the theta frequency (6â€“12â€‰Hz) during both non-freezing and freezing periods remained unchanged across sessions (Fig.Â 1g), demonstrating the specificity of changes in the slow gamma range. In addition, activity in the 2â€“6â€‰Hz range during freezing periods, which was previously associated with fear memory¹⁷ and replicated in our data as sharp peaks in the PSD curves during those periods (Fig.Â 1e), remained unchanged across sessions (Supplementary Fig.Â 3a). These results suggest that the enhanced slow gamma activity in the BLA is associated with recall mechanisms specific to the recent timepoint, possibly reflecting activity changes of neuronal population in the BLA during recent recall¹⁸.

Entrainment of BLA gamma by theta activity of multiple regions is augmented during remote recall

The amplitude of gamma activity is known to be modulated by the phase of the theta oscillation in various regions¹⁵, and this theta-gamma phase-amplitude coupling (PAC) has also been observed in the BLA associated with fear memory¹⁹. To examine if the increase of gamma activity during recent recall is associated with altered theta-gamma coupling, we calculated the PAC of LFP in the BLA during recent and remote memory recall (Fig.Â 2a). PAC strength, quantified by the modulation index²⁰, was largest between theta phase and slow gamma amplitude during non-freezing periods (Fig.Â 2b), suggesting that theta to slow gamma coupling is predominant during these epochs. In contrast, during freezing the peak of phase-amplitude coupling was shifted to the 2â€“6â€‰Hz range (Fig.Â 2b), suggesting distinct modes of cross-frequency coupling control BLA slow gamma during non-freezing and freezing periods.

**Fig. 2: BLA slow gamma activity is theta-entrained by multiple regions during remote recall.**

We then examined the changes in theta to slow gamma modulation across sessions. Similar to the power of slow gamma activity, theta to slow gamma PAC increased from baseline to the recent recall session during non-freezing periods (Fig.Â 2c). Surprisingly, however, theta to slow gamma PAC demonstrated an even larger significant increase during remote recall (Fig.Â 2c), despite an overall decrease in slow gamma power (Fig.Â 1f). Again, this increased PAC remained significant even after excluding the influence of locomotion (Supplementary Fig.Â 2b). These results suggest that the timing in relation to theta, but not the intensity, of gamma activity may best reflect remote recall processes. Notably, theta to slow gamma PAC, as well as 2â€“6â€‰Hz to slow gamma PAC, did not increase from recent to remote sessions during freezing periods (Fig.Â 2c and Supplementary Fig.Â 3b). Overall, these observations demonstrate that theta-entrained phase control of BLA gamma during exposure to recall cues during non-freezing periods, but not during freezing itself, may differentiate remote from recent recall, consistent with the augmented theta synchrony between CA1 and ACC during remote recall specifically observed during non-freezing periods¹⁴.

To explore how theta activity in CA1 and ACC is linked to BLA gamma, we examined cross-frequency coupling between CA1 phase and BLA amplitude, as well as between ACC phase and BLA amplitude (Fig.Â 2d, g). Similar to the within-BLA modulation, PAC of both regional pairs peaks at theta phase and slow gamma amplitude during non-freezing periods and at 2â€“6â€‰Hz phase and slow gamma amplitude during freezing periods (Fig.Â 2e, h), implying the same mode of modulation within and across regions. Likewise, theta to slow gamma PAC was higher during remote compared to recent recall session for both regional pairs during non-freezing periods (Fig.Â 2f, i), which was again not affected by locomotion (Supplementary Fig.Â 2b). In contrast, PAC from either theta or 2â€“6â€‰Hz to slow gamma was not increased in freezing periods of the remote recall session (Fig.Â 2f, i and Supplementary Fig.Â 3b). Taken together, these data suggest that remote recall may be supported by multi-regional phase control of BLA gamma activity, while recent recall is better reflected in the intensity of gamma independent of distal control, implying distinct recall mechanisms mediated by BLA gamma activity.

As enhanced theta synchronization between CA1 and ACC during remote memory recall has been reported, observed as the increase in both amplitude correlation and coherence^14,21, it is possible that theta interaction across the three regions increases to co-entrain BLA gamma. To test this hypothesis, we first calculated the correlation of the instantaneous amplitude of the LFP at theta range between CA1 and ACC²² then looked at how this value correlates with BLA theta amplitude (Supplementary Fig.Â 4a). This tri-regional correlation index was significantly higher during remote recall session compared to the baseline and recent recall session during non-freezing periods (Supplementary Fig.Â 4b, c), but not during freezing periods (Supplementary Fig.Â 4d, e). These results suggest that remote recall could involve an augmented synchronization of theta oscillations across CA1, ACC and BLA, potentially controlling BLA gamma activity related to theta phase in a concerted manner.

BLA gamma intensity and its multi-regional theta-phase control differentially predict the onset of freezing for recent and remote recall

To understand how the timing of the changes in BLA gamma activity and its cross-frequency modulation relate to when memory recall occurs, we analyzed these LFP indices specifically around the onset of freezing bouts. Slow gamma power in the BLA exhibited a local peak within a 2â€‰s window prior to freezing onset only during the recent session, which was significantly higher than the average of slow gamma power over all non-freezing epochs (Fig.Â 3a). On the other hand, while theta to slow gamma PAC in the BLA exhibited local peaks significantly higher than non-freezing averages within a 2â€‰s window prior to freezing onset for both recent and remote sessions (Fig.Â 3b), PAC of BLA slow gamma by theta in CA1 and ACC both demonstrated the significant local peaks within the same window only for the remote session (Fig.Â 3c, d). These results suggest that BLA gamma and its cross-regional theta-phase control predict the onset of freezing at a short timescale during recent and remote recall, respectively, supporting the idea that these LFP phenomena are associated with recall processes themselves at each memory age.

**Fig. 3: BLA slow gamma activity and its multi-regional theta entrainment differentially predict the onset of freezing for recent and remote recall.**

Both feature-based and feature-free computational models accurately predict memory age based on multi-regional LFP

The distinct oscillatory activity in the BLA, CA1 and ACC associated with recent and remote recall suggests that computational models may be used to distinguish the age of the memory being recalled, even across individuals. To explore this possibility we first constructed classifiers using feature-based LightGBM to test the classification effectiveness based on oscillatory activity in different frequency bands. In this model, we used theta and slow gamma power that was associated with recent and remote recall in this study, as well as beta (20â€“30â€‰Hz) and fast gamma power (60â€“90â€‰Hz) implicated in various behaviors and information processing in previous studies^19,23,24,25, in LFP recorded in the BLA, CA1 and ACC to construct feature vectors (Fig.Â 4a). We also added PAC of BLA slow gamma amplitude by theta phase of BLA, CA1 and ACC, which was distinct between recent and remote recall, to the feature vectors. We employed between-subject cross validation to train, validate, and test this feature-based memory age classification (see Methods for details). The model achieved an average F1 score of 0.720 across the 12 mice in the test set, which was significantly higher than the label-shuffled control analysis when LFP during non-freezing periods was used (Fig.Â 4b), demonstrating the capability of LightGBM to accurately classify memory age. The performance of the model with locomotion included as an additional feature was comparable to that of the original model (Suppl Fig.Â 5a, c), indicating that locomotion does not dissociate the two memory ages. Also, the model was able to perform classification when LFP during freezing periods were used at the F1 score of 0.664 (Fig.Â 4b), which was slightly lower than that when non-freezing periods were used, but still significantly better than the shuffled control.

**Fig. 4: Feature-based LightGBM and feature-free Transformer both accurately classify recent from remote recall based on multi-regional LFP.**

The classification performance of feature-based models could be biased by the arbitrary selection of input features by researchers, which might cause technical issues such as overfitting and mask the true predictive power of other features that were not selected. To resolve this, we next employed a feature-free approach, using a large-scale Transformer to build a model that can classify memory age from raw LFP data from the three regions. In our Transformer-based approach (Fig.Â 4c), we concentrated on the role of the self-attention mechanism in processing multi-channel LFP data. The data were segmented and encoded with position embeddings, with a classification vector (CLS) introduced to aggregate information across the sequence. The core lies in the self-attention mechanism, where the CLS token dynamically focused on and integrated relevant segments through calculated attention weights, which were represented as $[{{a}_{1,2}\cdots a}_{1,17}]$ (shown in red in Fig.Â 4c) and then projected back to the input raw LFP for the attention-guided feature analysis later. Self-attention enabled our model to extract essential global features for accurately predicting memory age by emphasizing the segments most pertinent to the classification task. As with the LightGBM approach, we employed a between-subject cross-validation strategy to train, validate, and test the model.

The Transformer model of the largest size (d_tâ€‰=â€‰5) achieved an average F1 score of 0.780 across the 12 mice in the test set when non-freezing segments were used, which was significantly higher than the label-shuffled control (Fig.Â 4d, e). In contrast, the average F1 score dropped to 0.645 when freezing segments were used, which was higher than the label-shuffled control, but lower than the score obtained from non-freezing segments (Fig.Â 4d, e). To assess the impact of Transformer size on classification, we compared performance of Transformers with three different sizes: normal (d_tâ€‰=â€‰3), large (d_tâ€‰=â€‰4), and xlarge (d_tâ€‰=â€‰5). Performance using non-freezing segments was comparable across different Transformer sizes (Fig.Â 4d, f), implying saturated performance even at the normal size. In contrast, performance using freezing segments was improved when larger Transformer sizes were used, showing the benefit of progressively extracting higher-dimensional features, however it was still lower than the performance with non-freezing segments at all Transformer sizes (Fig.Â 4d, f). These results demonstrate in a non-biased manner that LFP during non-freezing periods contains more memory age-related information than that during freezing periods. Taken together, both LightGBM and Transformer models were capable of predicting memory age at high accuracy, revealing the robustness of distinct multi-regional signatures associated with memory age.

Computational models predict memory age based on multi-regional oscillatory activity

As oscillatory activity in the slow gamma and theta ranges as well as theta-gamma PAC were differentially regulated at recent and remote timepoints in our study, we asked whether LightGBM and/or Transformer have built models that extract information related to these LFP activity patterns. To first decipher the decision-making features of LightGBM, we analyzed the split feature importance, which determines the importance of each feature based on the number of times it is used to split the input feature vectors during the construction of the tree-based classifier. One common issue when analyzing the split feature importance is the noise baseline (null importance), which is a feature importance value obtained even following class label-shuffling of the training data. To circumvent this issue, we performed null importance analysis²⁶, where we considered a feature having significant contribution only if the importance value exceeds 99.7% confidence interval of null importance distribution, which was obtained by separately training the model with labels shuffled 1000â€‰times. Among the features which made significant contributions for the model using non-freezing epochs, we found that CA1 theta power and BLA slow gamma power together accounted for 70% of the importance (Fig.Â 5a). In addition, PAC of BLA slow gamma by both CA1 theta and BLA theta significantly contributed to the model (Fig.Â 5a), largely consistent with the differential roles of these oscillatory activities and couplings for recent and remote recall (Figs.Â 1f, 2c, f and Supplementary Fig.Â 5d). Significantly contributing features remained largely unchanged when locomotion was included as an additional feature, with locomotion itself not contributing significantly (Supplementary Fig.Â 5b), indicating that locomotion would not be informative to dissociate recent from remote recall.

**Fig. 5: Multi-regional oscillatory activity serves as key features utilized to predict memory age by both LightGBM and Transformer.**

To probe what kind of information the Transformer model was extracting from raw LFP, we adopted a strategy that combines attention-guided input selection with theory-driven feature extraction, which consisted of the following three steps. First, we utilized the attention weights in the first layer of the trained model (illustrated as [a_1,2 â€¦ a_1,17] to highlight LFP patches) to identify potentially high-contributing segments of the data (Fig.Â 5b). Second, we performed theory-driven feature extraction on the Transformer-focused patches in accurately or inaccurately predicted samples (Fig.Â 5c). Finally, by comparing recent-vs-remote differences of the feature between accurate and inaccurate samples (Fig.Â 5d), we calculated the Attention-guided Feature Importance (AFI) index which assessed the potential contribution of each feature to the modelâ€™s decision-making (see Methods for details). This approach revealed multiple putative oscillatory features that made significant contributions for the model using non-freezing periods, including CA1 theta and BLA slow gamma (Fig.Â 5e), again consistent with the feature importance of LightGBM models.

To further validate significance of each putative important feature for the Transformer model, we performed perturbation analysis, in which degradation of classification performance was assessed when each oscillatory feature in the raw data was filtered out individually (Fig.Â 6a). These analyses demonstrated a significant drop of F1 score when CA1 theta or BLA slow gamma was perturbed (Fig.Â 6b), confirming that the Transformer model relied on these oscillatory activities for accurate classification. Taken together, BLA slow gamma activity and its presumable entrainment by CA1 theta serve as reliable physiological signatures dissociating remote from recent memory recall, which was consistently utilized by feature-based LightGBM and feature-free Transformer modeling.

**Fig. 6: Transformer relies on CA1 theta and BLA slow gamma to predict memory age.**

Discussion

Despite the well-recognized role of the BLA in fear memory, its precise neural activity dynamics associated with memory age have been unexplored. Here we find that recall of recent fear memory is reflected in augmented gamma activity in the BLA, while remote recall is associated with PAC between BLA gamma amplitude and theta phase in multiple distal regions, including CA1 and ACC (Fig.Â 6c). Both LightGBM and Transformer modeling indicated BLA gamma and CA1 theta as most informative features to differentiate remote from recent memory, suggesting that these multi-regional oscillatory activities are key signatures for fear memory age extracted in a non-biased manner.

Previous studies have reported that the BLA is required for both recent and remote fear memory^2,3, suggesting its memory age-independent role. Here, we have examined the detailed activity dynamics of BLA associated with different memory ages and found that intensity and phase control of BLA gamma are differentially recruited by recent and remote recall, suggesting distinct fine-scale activity patterns in the BLA that characterize different memory ages. Further investigation is needed to understand how higher-intensity and more phase-controlled gamma equally lead to the expression of fear memory. It has been proposed that temporally coordinated neural activity could activate target neurons more effectively by being aligned to their excitability and coincidence detection windows^27,28. It is possible that phase-controlled gamma in the BLA could provide efficient signaling to its downstream regions required for fear expression, such as central amygdala²⁹, potentially attaining comparable outcomes to that of high intensity gamma.

We have further shown that there is significant modulation of BLA gamma during remote recall by theta activity in both CA1 and ACC, as well as in the BLA itself. While it was traditionally thought that the hippocampus is not involved in remote memory^30,31, there is now accumulating evidence that it is required for both recent and remote memory particularly for retaining episodic details^6,8,32,33,34, supporting the active role of CA1 for remote recall in our study. It has also been shown that theta synchrony between higher-order auditory cortex and BLA is enhanced during recall of remote auditory fear memory³⁵, suggesting that cross-regional theta entrainment of BLA activity might underlie recall of various forms of remote memory. In addition, an enhanced correlation between CA1-ACC theta synchrony and BLA theta was observed during remote recall in our study, suggesting that the three regions may work together via theta-phase-entrainment of BLA gamma at remote time points leading to contextual fear expression and freezing.

Importantly, the increase of BLA gamma at recent recall and the enhancement of its theta phase control at remote recall was observed particularly before the onset of freezing. These dynamics suggest that intensity and phase control of BLA gamma are involved in the processes aiding recall and/or freezing at each memory age, consistent with the reported role of BLA gamma in threat assessment and response selection³⁶. Interestingly, these LFP indices as well as theta power in the BLA, CA1 and ACC were not significantly correlated with the total duration of freezing at the level of individual mice (Supplementary Fig.Â 6). We speculate that BLA gamma activity and its phase control support the initiation of fear memory recall or freezing, but the subsequent maintenance of sustained freezing might be regulated by separate neural mechanisms involving downstream regions, such as central amygdala and brainstem nuclei²⁹.

Here we explicitly designed a within-subject longitudinal study, recording LFP during recent and remote recall in the same cohort of mice. This approach has been repeatedly employed in the field^14,21,37,38, as the LFP properties we characterized, such as theta and gamma power as well as theta-gamma PAC, can exhibit high individual variability (Fig.Â 1f, g and Fig.Â 2c, f, i) due to differences such as precise electrode location and quality^12,13. As a result, comparing the physiology of recent and remote recall in different cohorts (between-subjects) may result in drastically reduced statistical power. On the other hand, in our within-subject approach we cannot rule out the possibility that the changes of LFP patterns observed at remote recall might be due to repeated contextual exposure, independent of memory age. However, previous similar recordings both in the hippocampus and ACC²¹, as well as in the amygdala^39,40, suggest this is unlikely, as cross-regional interactions as well as single-band power remain unchanged over multiple test sessions spaced closer together in time. Further, most of the LFP patterns examined in our study demonstrate differential changes from baseline to recent recall sessions and from recent recall to remote recall sessions (Fig.Â 1f, Fig.Â 2f, i and Supplementary Fig.Â 2), and such non-monotonic changes have not been observed for theta power, gamma power or its cross-regional synchrony in and across the hippocampus, medial prefrontal cortex and amygdala with mere repetitive exposure to memory-related context or cue, particularly while the strength of memory and the animalsâ€™ behavior remains constant^21,39,40,41. Nevertheless, while we believe the changes in BLA gamma, as well as its PAC by multiple regions, from recent to remote recall sessions in our study most likely reflect the difference of memory age, we acknowledge that the effect of repeated contexture exposure on LFP of various brain regions still needs to be rigorously examined in future studies.

Finally, we asked if these oscillatory activities in multiple regions could be used to classify memory age by LightGBM and Transformer modeling. Both models demonstrated remarkably high accuracy in predicting memory age, suggesting their broad applicability in classification based on neural/electrophysiological data. Notably, the Transformer model could classify memory age at higher accuracy when data from non-freezing periods were used as the input compared to when data from freezing periods were used, demonstrating that non-freezing periods might contain more information associated with different memory ages. This is consistent with our biological data that the majority of activity differences between recent and remote recall within BLA and across regions was observed only during non-freezing periods, particularly at the time shortly before the onset of freezing. Together, these observations suggest that non-freezing periods play an active role in recall processes presumably through continuous exposure to recall cues in the fear-conditioned context, which does not occur during stationary, freezing periods.

While a universal theory or benchmarks for Transformer interpretation are still in development, we have also made an exploratory attempt to analyze the features that could be extracted by the models using attention-based interpretation plus perturbation analysis (Figs.Â 5bâ€“e, 6a, b). This approach has identified oscillatory activity in multiple regions as important features, consistent with the results of the feature importance analysis of LightGBM and highlighting the potential of attention-based methods in enhancing model interpretability. It is noteworthy that in perturbation analysis, the significance of CA1 theta far exceeded that of BLA slow gamma. This could be because the perturbation of CA1 theta, which was a major component of signal power, caused distortion of the signal, rendering any classification task challenging even when the signal contained useful features, including BLA slow gamma. Additionally, the high degree of correlation between CA1 theta and ACC theta activities¹⁴ could have rendered the utilization of just one of these signals sufficient for classification tasks, which resulted in the lower significance of ACC theta in the analysis. Although perturbation analysis involves certain challenges, importantly it enriched the decision-making process of the model by providing a double confirmation of the decision-making features discovered by the attention-based interpretation.

Overall, this study provides a more comprehensive understanding of reliable neural activity patterns in the BLA and its associated distal regions which characterize the age of fear memory. Multi-regional oscillatory dynamics have been identified as the primary features differentiating recent from remote memory, particularly by a non-feature-biased Transformer model, revealing robust and stereotypical codes of memory age dispersed across brain regions.

Methods

Experimental animals

Adult male wild-type C57BL/6â€‰J mice (Japan SLC, Inc.; C57BL/6JJmsSlc substrain), bred in-house, were used for all experiments. Mice were in the range of 3â€“7â€‰months old over the course of the experiments. Mice were maintained on a 12â€‰h light/dark cycle (light cycle: 8 AMâ€“8 PM) at 22â€“24â€‰Â°C and 30â€“50% humidity, provided with food and water ad libitum. All experiments were performed at the light phase. All procedures were approved by the RIKEN Institutional Animal Care and Use Committee and complied with all relevant ethical regulations.

Contextual fear conditioning apparatus

Baseline, training and recall sessions of contextual fear conditioning were all conducted in a brightly lit behavioral training room. The chamber, consisting of an acrylic plastic front and back with aluminum walls on each side, measured 30â€‰Ã—â€‰25â€‰xâ€‰21â€‰cm (Med Associates ENVâ€“008, Georgia, VT). The floor of the chamber consisted of 36 stainless steel rods of 3.2â€‰mm diameter and spaced 7.9â€‰mm apart and was connected via a cable harness to a shock generator. The chamber was cleaned between animals with 70% ethanol, and a solution of 1% acetic acid was placed beneath the chamber during the experiment to provide an olfactory cue. Freezing behavior was video recorded by the camera located behind the chamber at a frame rate of 3.75â€‰Hz. Shock deliveries and video recordings were controlled by FreezeFrame 2 software (Actimetrics, Wilmette, IL).

Microdrive construction

Custom microdrives were manufactured with the assistance of the Advanced Manufacturing Team at RIKEN. The Microdrive was constructed on a 3D-printed plastic base which held an interface board (EIB-36; Neuralynx, Bozeman, MT) and nichrome (14â€‰Î¼m diameter) tetrodes that were independently adjustable along the z-axis. These tetrodes were positioned to the right BLA (center coordinate is AP â€“2.0, MLâ€‰+â€‰3.3, DV â€“4.3), the right dorsal CA1 (AP â€“2.0, MLâ€‰+â€‰1.5, DV â€“1.3), the right ACC (APâ€‰+â€‰0.8, MLâ€‰+â€‰0.3, DV â€“1.6) and the right ventral hippocampal commissure (vHC; AP â€“0.8, MLâ€‰+â€‰0.7, DV â€“2.4) where the reference signal was recorded. All tetrodes were gold plated to an impedance of 200â€“300â€‰kÎ© prior to surgery.

Surgery

Mice were anesthetized using Avertin (2, 2, 2-tribromoethanol; Sigma-Aldrich, 476â€‰mgâ€‰kg^âˆ’1, i.p) and placed into a stereotactic frame (Kopf). The skull was leveled using bregma and lambda landmarks. Small craniotomies were made individually over the right BLA, CA1, ACC and vHC. Tetrodes were then inserted to their respective target regions simultaneously, ensuring that all tetrodes were targeted to the stereotaxic coordinates described above. In addition, a stainless steel screw was implanted above the left cerebellum to serve as a ground. The microdrive was fixed to the skull with dental cement (Super-Bond C&B, Sun Medical). Following recovery from anesthesia, mice were returned to their home cage where they were housed individually.

After surgery, tetrodes in CA1 were slightly adjusted into the pyramidal cell layer, which was identified by numerous large amplitude pyramidal cell spikes and sharp-wave ripples during periods of immobility. Tetrodes in BLA and ACC were adjusted only minimally (~Â±100â€‰um). Tetrodes in the vHC were left unadjusted.

Experimental procedure

At 2â€“3â€‰weeks after surgery, the contextual fear conditioning experiment was conducted. On Day 1, mice were placed in the fear conditioning chamber and their LFP was recorded for 10â€‰min (baseline session). After 30â€‰min of rest in the housing room, they were brought back to the fear conditioning chamber and fear-conditioned with a 2â€‰s-long, 0.5â€‰mA foot shock delivered three times at 1â€‰min intervals.

On Day 2, mice were brought back to the fear conditioning chamber at the same time of the day as Day 1, and their recall of contextual fear memory was assessed for 10â€‰min while their LFP was recorded (recent session). On Day 26â€“28, the protocol used on Day 2 was repeated and fear recall was assessed again with LFP recordings (remote session).

Histology

At the conclusion of the experiment mice underwent terminal anesthesia and tetrode positions were marked by electrolytic lesioning of brain tissue (30 Î¼A current for 7â€‰s through each electrode individually). Brains were fixed with cardiac perfusion of 4% paraformaldehyde (PFA) followed by post-fix in PFA for 24â€‰h and coronal slices with 50â€‰Î¼m thickness were prepared on a cryostat. Tetrode placement was confirmed by standard light microscopy.

Data acquisition

Recordings were obtained via a unity gain headstage preamplifier (HS-36; Neuralynx) connected to a flexible tether cable (TETH-HS-36-Flex-EXT; Neuralynx) suspended from the ceiling with a rubber band, which allowed mice to freely move their head up and down with minimal force. Data were acquired using a 32-channel Digital Lynx 4SX acquisition system with Cheetah software (v 5.7.4; Neuralynx). LFP signals were sampled at 32,000â€‰Hz and filtered between 1â€“9000â€‰Hz. After recordings, LFP was down-sampled to 1600â€‰Hz for subsequent analyses.

Behavioral analysis

Non-freezing and freezing periods were determined by manually examining all video frames over the entire recording. The periods of each state lasting <1.28â€‰s were excluded from analysis. More detailed behavioral categorization and calculation of animal velocity were performed by tracking each body point of the mice using the DeepLabCut algorithm⁴², as described previously¹⁴.

Power (PSD) analysis

A tetrode that was located closest to the center of each of BLA, CA1 and ACC was selected for each animal based on histology after the experiment and the same tetrodes were used across all sessions. Timestamps of non-freezing and freezing periods were aligned with those of LFP data by referencing them to the TTL pulse sent from FreezeFrame to Cheetah Software (Neuralynx) at the beginning of the experiment. Power spectral density (PSD) from non-freezing and freezing periods was calculated with the pmtm function in MATLAB (Mathworks) using 1.28â€‰s epochs. Mean PSD across frequency ranges of 2â€“6â€‰Hz, 6â€“12â€‰Hz (theta), 30â€“50â€‰Hz (slow gamma) and 60â€“90â€‰Hz (fast gamma) were then calculated. For the time course of PSD (Fig.Â 1d), the pwelch function in MATLAB was alternatively used with 0.64â€‰s epochs that overlapped by 0.32â€‰s.

To analyze the dynamics of PSD around the onset of freezing (Fig.Â 3), the episodes that include at least 3.2â€‰s of non-freezing periods followed by at least 1.28â€‰s of freezing periods were extracted. Within these 4.48â€‰s segments, PSD was calculated every 0.32â€‰s and averaged over all segments from each animalÂ for each time point.

Cross-frequency coupling

Cross-frequency phase-amplitude coupling (PAC) of LFP oscillation within and between regions was analyzed as previously described²⁰. First, raw LFP from one region was filtered at either 2â€“6â€‰Hz or theta (6â€“12â€‰Hz) frequency, and the phase of the filtered LFP was computed using the Hilbert transform. Raw LFP from the same or different region was also filtered at slow gamma (30â€“50â€‰Hz) frequency, and the amplitude envelope was extracted with the Hilbert transform. To examine the coupling strength of slow gamma amplitude to 2â€“6â€‰Hz or theta phase, the phase values were binned into 36 bins (0â€“360Â°, 10Â° intervals) and the mean amplitude of slow gamma in each phase bin during the state being analyzed was calculated. These values were then used to calculate the modulation index²⁰, which indicates the strength of PAC of slow gamma amplitude by 2â€“6â€‰Hz or theta phase. The dynamics of PAC around the onset of freezing (Fig.Â 3) were analyzed in the same way as those of PSD as described above.

To generate a comodulogram between low-frequency phase and high-frequency amplitude, the phase of the low-frequency oscillation was extracted between 4 and 18â€‰Hz with bandpass filters of 4â€‰Hz width (stepping by 0.5â€‰Hz) and the amplitude of high-frequency oscillation was calculated between 25 and 135â€‰Hz with bandpass filters of 10â€‰Hz width (stepping by 2â€‰Hz). Modulation index for each phase-amplitude pair was plotted as a heat map.

Correlation of LFP instantaneous amplitude

Relationship of the instantaneous amplitude of LFP oscillations at the frequency range of interest across three regions was analyzed as follows. First, raw LFP from the three regions was filtered at the target frequency range using a zero-phase-delay filter, and the amplitude envelope was extracted with the Hilbert transform. The correlation coefficients between envelopes of two of the three regions over 4â€‰s epochs within the state being analyzed (non-freezing / freezing) was then calculated with 2â€‰s overlaps as previously described²². Then, the correlation of these values with mean envelope of the third region within the same epochs was calculated. Mice that contained <10 epochs were excluded from analysis for each state (occurred only for freezing periods).

Computational modeling by LightGBM

Input format

For both non-freezing and freezing periods, segments >5.12â€‰s were selected for analysis. Within each of these 5.12â€‰s segments, a sliding window of 2.56â€‰s with a step of 0.5â€‰s was used to obtain a series of 2.56â€‰s LFP segments. From each of these 2.56â€‰s segments (segment $i$), different combinations of features were extracted as shown in each figure to form the feature vector ${x}_{i}$: these features could include the PSD of theta (6â€“12â€‰Hz), beta (20â€“30â€‰Hz), slow gamma (30â€“50â€‰Hz) and fast gamma (60â€“90â€‰Hz) from BLA, CA1 and ACC, phase-amplitude coupling (PAC) of slow gamma amplitude in BLA by theta phase in BLA, CA1 and ACC, and mean velocity of the mouse. For the classes of recent and remote sessions, the feature vectors from 12 mice totaled approximatelyÂ 10,000. These feature vectors were used for training, validating, and testing the LightGBM based on between-subject cross validation.

Model architecture

As a gradient boosting framework that uses tree-based learning algorithms, LightGBM is currently the most widely used feature-based classifier in data science. It is a Gradient Boosting Decision Tree (GBDT) embedded with Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB), which facilitates efficient training without compromising accuracy. The loss function at iteration $t$ can be described below.

$${{L}}^{\left({t}\right)}=\mathop {\sum }\limits_{{i}=1}^{{n}}{L}\left({{y}}_{{i}},{{\widehat{y}}_{{i}}}^{({{t}}-1)}+{{f}}_{{t}}\left({{x}}_{{i}}\right)\right)+\Omega \left({\,{f}}_{{t}}\right)$$

(1)

Where ${y}_{i}$ is the true label of the feature vector $i$; ${\hat{{y}_{i}}}^{(t-1)}$ is the prediction at iteration $t-1$; ${f}_{t}\left({x}_{i}\right)$ is the $t$ th added tree with ${x}_{i}$ as the input feature vector with frequency features from three regions; Î© $\left({f}_{t}\right)$ is the regularization term for tree ${f}_{t}$ which prevents overfitting and improves generalization; ${\mbox{n}}$ is the number of samples in the training data. The LightGBM loss was defined in a boosting way as the error was reduced iteratively as new trees added to the forest, which means each added new tree focused on the samples that were remaining misclassified.

Model hyperparameters

The modelâ€™s hyperparameters were chosen to optimize performance. Initially, for the cross-validation phase, the model was configured with an objective of cross entropy (xentropy) and utilized the gradient boosting decision tree (GBDT) boosting. The learning rate was set to 0.01, with the number of leaves and the maximum depth established at 31 and 6, respectively. We employed 1000 estimators and enabled early stopping after 100 rounds without improvement on the validation set, to prevent overfitting and enhance model generalization. Subsequently, for the inference phase, the model adopted a similar configuration but adjusted the maximum depth to 8 and set the number of estimators based on the average best iteration determined during the cross-validation process.

Feature importance analysis

To assess the importance of each feature, split feature importance was analyzed, which measures the number of times a feature is used in a tree-based boosting model for splitting the input. One disadvantage of using raw importance value in LightGBM is the noise baseline derived from the training set. Specifically, under the influence of noise, a model might assign an importance to a feature that is not significant, termed noise feature importance, which should not be added to the actual importance. Only those importances significantly greater than the noise level should be considered as genuine contribution to the prediction. To eliminate such noise, null importance analysis was performed to assess the genuine contribution of features in decision models against noise²⁶ in the following way. First, raw feature importance was calculated. Next, labels of the training data were shuffled 1000â€‰times and models were separately trained with those labels, which generated the distribution of null (shuffled) feature importance. Finally, the raw feature importance was compared with the null distribution. If the raw importance fell above the 99.7% confidence interval of the null distribution (greater than the mean $\mu$ of the null distribution plus three times the standard deviation $\sigma$), the featureâ€™s contribution was considered to be significantly greater than noise. ${{Raw}}\ {{importance}}-\mu -3\sigma$ was used as the final feature importance.

Cross-validation scheme

Cross-validation (CV) utilizes mutually exclusive validation sets and an independent test set to objectively evaluate a modelâ€™s performance, ensuring the model captures core, generalizable features of the data rather than biased abnormalities that lead to overfitting. In our between-subject CV scheme, one out of 12 mice was first selected to serve as an independent test set, which did not participate in the training or validation phases. The remaining 11 mice were randomly divided into 5 folds, each containing either two or three mice. During this 5-fold CV phase, for each fold, the mice belonging to that fold were used as the validation set and the rest as the training set. Once the model was trained with the early stopping on the validation setâ€™s cross entropy, the validation set was input to obtain the final validation performance for that fold. A total of 5 stable validation performances were obtained for the 5 folds. A final model was then trained using data from all 11 mice across the 5 folds, maintaining the training parameters constant and using the mean of the early stop rounds obtained in all folds as the number of training rounds. The test set was input into the trained final model to obtain test performance.

It is important to note that a specific allocation of mice to train, validation, and test sets could potentially introduce allocation bias. To eliminate this bias, the CV scheme was executed 12â€‰times, selecting a different mouse as the test set each time and randomly shuffling the remaining 11 mice for CV. Thus, 12 validation-test performance pairs were obtained in total. The test performance and feature importance for the 12 mice were used for further analysis.

Performance metric

The modelâ€™s classification performance was assessed by the F1 score, which balances the classification effectiveness in the presence of an unbalanced class distribution. The F1 score is the harmonic mean of precision and recall:

$${{\rm{F}}}1=2\times \frac{{{\rm{precision}}}\times {{\rm{recall}}}}{{{\rm{precision}}}+{{\rm{recall}}}}$$

(2)

where precision represents the proportion of true positives among all samples predicted as positive, and recall represents the proportion of true positives that were correctly predicted out of all actual positive cases:

$${{\rm{Precision}}}=\frac{{{\rm{True\; Positives}}}}{{{\rm{True\; Positives}}}+{{\rm{False\; Positives}}}}$$

(3)

$${{\rm{Recall}}}=\frac{{{\rm{True\; Positives}}}}{{{\rm{True\; Positives}}}+{{\rm{False\; Negatives}}}}$$

(4)

Computational modeling by Transformer

Input format

One advantage of deep learning models is their ability to perform data mining directly from raw data. An expectation of this study is that deep learning models could spontaneously and unbiasedly identify features significantly representative of memory age. Therefore, we used the 2.56â€‰s raw LFP segments, obtained as in LightGBM, as direct inputs to the model. Each input had 4096â€‰time sampling points (2.56â€‰sâ€‰xâ€‰1600â€‰Hz) with spatial resolution across three brain regions (BLA, CA1 and ACC), thus the input data shape was 3â€‰Ã—â€‰4096 (brain regionsâ€‰Ã—â€‰time sample points).

Model architecture

The implementation of our model was inspired by the Vision Transformer (ViT)⁴³, initially utilized in image recognition. Innovatively, ViT segments each image into small patches, reorganizing them into a sequence of vectors, aligning with the standard input format for a Transformer. In our case, the input to the Transformer was LFP at three brain regions with a time sample of 4096 (shapeâ€‰=â€‰3â€‰Ã—â€‰4096). We segmented this input LFP into 16 patches along the temporal dimension, each having a shape of 3â€‰Ã—â€‰256. Each patch was transposed and its three columns were then stacked to make a vector with a shape of 768â€‰Ã—â€‰1. All 16 vectors are fed into a projection layer to construct a sequence of vectors, described as ${x}_{j}^{i}$Â ($j=1 \sim 16$), where $i$ and $j$ were sample and patch index, respectively. The vector was embedded with position information which represented the order of the patches. Then, a CLS token was added in front of the sequence. The CLS token is a special vector that aggregated information from all the patches over the course of multiple self-attention operations later. Then a sequence, i.e., ${X}^{i}=\left[{{\rm{CLS}}},{x}_{1}^{i},{x}_{2}^{i},\ldots,{x}_{16}^{i}\right]$, was constructed for the Transformer block.

The Transformer block incorporated a multi-head self-attention mechanism, which was its core component. There were ${d}_{h}$ heads in each Transformer block. In each head, the input sequence ${X}^{i}$ was projected onto the Query, Key, and Value spaces, with the corresponding matrices ${Q}_{i}$, ${K}_{i}$, and ${V}_{i}$:

$${{\rm{Attention}}}\left({Q}_{i},{K}_{i},{V}_{i}\right)={{\rm{Softmax}}}\left(\frac{{Q}_{i}{K}_{i}^{T}}{\sqrt{{d}_{e}}}\right){V}_{i}$$

(5)

where ${d}_{e}$ serves as a scaling factor. The outputs from ${d}_{h}$ heads were then concatenated and fed into a non-linear layer to achieve the output of the Transformer block, which had the same shape of sequence as input. Then, ${d}_{t}$ Transformer blocks were stacked to enable comprehensive feature discovery. Finally, the CLS vector output from the last block, accumulating the global representation of the entire sequence, was fed into a dense layer, followed by a softmax layer for memory age classification.

Model hyperparameters

The Transformer block was configured with 64-dimensional space to project the input vectors. It contained ${d}_{h}=4$ attention heads in each block, with ${d}_{t}={\mathrm{3,4}},{or} \, 5$ blocks stacked together. A layer normalization was performed after the blocks, followed by the final dense layers with a 64-dimensional intermediate layer and two classes as outputs.

The model was configured with a cross-entropy loss function for memory age classification. The learning rate was set to 0.001 with a weight decay of 0.0001. Early stopping was implemented with a patience valueÂ of 5, monitoring validation loss. Adam optimizer was employed to train the Transformer. The implementation is based on TorchEEG⁴⁴, an open-source deep learning library available at https://github.com/torcheeg/torcheeg.

Attention-guided feature importance analyses

Transformers may perform well without identifying key features in the data. Interpretability research for Transformers is categorized into attention-based⁴⁵, gradient-based⁴⁶, and relevance-based⁴⁷ methods. These approaches generate importance heatmaps by projecting calculated attentions, gradients, or relevance from the model back to the input, where higher weights signify greater decision impact. However, for the LFP data, merely projecting attention weights that signify importance onto LFP patches is insufficient because it does not reveal what theory-explainable features buried in the patches are truly highlighted. Therefore, we employed a strategy of attention-guided input selection with theory-driven extraction and perturbation, which consisted of three stages. First, we used the attention weights in the first layer of the trained model to highlight LFP patches, identifying potentially highly contributing segments (Fig.Â 5b). Second, we performed theory-driven feature extraction on Transformer-focused patches in both accurately and inaccurately predicted samples (Fig.Â 5c). Finally, we quantitatively compared the disparity between the effect size of a featureâ€™s recent-vs-remote statistical difference for accurate-predicted samples and that for inaccurate-predicted samples (Fig.Â 5d) by defining Attention-guided Feature Importance (AFI), thereby determining the potential feature contribution to the model. The detail of each stage is described below.

[Stage 1]

Attention weights corresponding to the CLS vector in the first block were used as guidance for interpretation (illustrated by bold in Eq. (6)). Our choice of the CLS-corresponding attention was due to the CLS being a classification backbone that aggregated information from all patches. The selection of the first block was because the attention had a direct mapping relationship with the input data, where the weights directly indicated the degree of self-attentionâ€™s selection of input patches. Mathematically,

$${{\rm{Attention\; Score}}}\left({Q}_{i},{K}_{i},{V}_{i}\right)={{\rm{Softmax}}}\left(\frac{{Q}_{i}{K}_{i}^{T}}{\sqrt{{d}_{e}}}\right)=\left[\begin{array}{ccc}{{{\boldsymbol{a}}}}_{{{\bf{1}}},{{\bf{1}}}} & {{\cdots }} & {{{\boldsymbol{a}}}}_{{{\bf{1}}},{{\bf{17}}}}\\ \vdots & \ddots & \vdots \\ {a}_{17,1} & \cdots & {a}_{17,17}\end{array}\right]$$

(6)

where each row represents one query, and each column represents one key. The attention weights of the keys (16 patches) that is queried by the first query (the CLS query), which is the first row of the attention matrix, were needed to be obtained. The first score ${a}_{{\mathrm{1,1}}}$ corresponds to CLS, thus the attention weights $\left[{{a}_{{\mathrm{1,2}}}\cdots a}_{{\mathrm{1,17}}}\right]$ were projected to the input patches (Fig.Â 5b).

[Stage 2]

For each test sample, the model provided a memory age prediction with the attention projected onto the input LFP (Fig.Â 5b). First, samples were selected from the two categories (recent/remote) that were correctly predicted. If the probability of the ground truth category prediction was >0.55, then it was regarded as an accurate prediction. For accurately predicted test samples (blue in Fig.Â 5c), the corresponding attention score was used to segment the LFP. These segmented LFPs, which have high attention, were fed into a theory-driven extractor, where the PSD of theta, beta, slow gamma, and fast gamma frequency bands in the BLA, CA1 and ACC was extracted. Each feature was grouped by its corresponding real label (recent or remote), from which statistical distributions were derived. For example, in the representative case shown in Fig.Â 5d, the PSD of BLA slow gamma in recent session is significantly higher than that in remote session in the high attention patches of the accurately predicted samples. The rationale is that if a feature shows a significant distribution difference between recent and remote sessions in the segments highlighted by the Transformerâ€™s attention, it was considered a putative feature potentially utilized by the attention mechanism. Furthermore, incorrectly predicted samples were selected for the same analysis with a prediction probability of <0.45 as the criterion (brown in Fig.Â 5c). Importantly, if theÂ attention-guided feature in incorrectly predicted samples presents an opposite distribution to that seen in correctly predicted samples, it further indicates that the feature was potentially used for decision-making.

[Stage 3]

The Attention-guided Feature Importance (AFI) was defined to measure the significance of a feature in the decision-making process. Essentially, it assesses the disparity between the effect size of a featureâ€™s recent-vs-remote statistical difference for accurate-predicted samples and that for inaccurate-predicted samples. The greater the disparity in effect sizes for these two cases, the greater the potential importance of the feature. Effect size, specifically Cohenâ€™s d⁴⁸, was used to measure the magnitude of statistical differences between recent and remote sessions:

$$d=\frac{{M}_{{recent}}\,{-}\,{M}_{{remote}}}{S{D}_{{pooled}}}$$

(7)

where ${M}_{{recent}}$ and ${M}_{{remote}}$ are the means of the two groups, $S{D}_{{pooled}}$ is the pooled standard deviation of the two groups, as calculated below,

$$S{D}_{{pooled}}=\sqrt{\frac{\left({n}_{{recent}}\,{-}\,1\right)S{D}_{{recent}}^{2}+\left({n}_{{remote}}\,{-}\,1\right)S{D}_{{remote}}^{2}}{{n}_{{recent}}+{n}_{{remote}}\,{-}\,2}}$$

(8)

where ${n}_{{recent}}$ and ${n}_{{remote}}$ are the sample sizes, and $S{D}_{{recent}}$ and $S{D}_{{remote}}$ are the standard deviations of the two groups. The AFI was calculated as follows: First, it was determined whether there was a significant difference of the feature between recent and remote in accurately predicted samples. If there was no significant difference, it indicated that the feature was not given significant attention, and thus the AFI was set to 0. If the difference was significant, then the effect size ${d}_{{accurate}}$ was calculated. Next, it was examined whether there was a significant difference of the feature between recent and remote in inaccurately predicted cases. There were three possible scenarios: (1) If the difference was significant but the sign of the effect size was the same as in the accurate cases, it indicated that there was no disparity in the direction of the recent-vs-remote differences between the two cases, and therefore the AFI was set to 0; (2) If the difference was significant and the sign was opposite, then ${{\rm{AFI}}}=\left|{d}_{{accurate}}\,{-}\,{d}_{{in}{accurate}}\right|$; (3) If there was no significance, then ${{\rm{AFI}}}=\left|{d}_{{accurate}}\right|$.

$${{\rm{AFI}}}=\left\{\begin{array}{c}0\\ \left|{d}_{{accurate}}\,{-}\,{d}_{{inaccurate}}\right|\\ \left|{d}_{{accurate}}\right|\end{array}\,\,\,\,\,\,\,\begin{array}{c}{p}_{{accurate}} \, > \,0.05 \, \, {{\rm{or}}} \, \, {{\rm{sign}}}({d}_{{accurate}})\,=\,{{\rm{sign}}}({d}_{{inaccurate}})\\ {p}_{{accurate}}\, < \,0.05 \, \, {{\rm{and}}} \, \, {p}_{{inaccurate}}\, < \,0.05 \, \,{{\rm{and \; sign}}}\left({d}_{{accurate}}\right)\,\ne \,{{\rm{sign}}}\left({d}_{{inaccurate}}\right)\\ {p}_{{accurate}} \, < \, 0.05 \, \, {{\rm{and}}} \, \, {p}_{{inaccurate}}\, > \,0.05\end{array}\right.$$

(9)

Perturbation analysis

To further validate the importance of input features, we performed perturbation analysis which has widespread applications in the fields of deep learning-based imaging⁴⁹ and text⁵⁰. Importantly, perturbation analysis does not depend on the modelâ€™s internal structure; it treats the model as a black box while manipulating input features. In this study, having identified putative features through the attention mechanism as described above, we validated the importance of these features by perturbing them in the raw LFP and observing changes in model prediction. A band-off filter was used to inhibit each putative feature identified by attention in the raw test LFP (Fig.Â 6a). Subsequently, the manipulated LFP without this feature was input into the trained model and the modelâ€™s prediction F1 score was calculated. The importance of the feature to the model was determined by comparing the differences of model performance between the original LFP and the perturbation situation.

Cross-validation scheme

Five-fold cross-validation with exclusion of one out of 12 mice, followed by testing of the excluded mouse, were performed for the 12 validation-test pairs as described in the section for LightGBM.

Hardware

For our deep learning experiments, we leveraged an Intel Core i5-9400F CPU with 64 GB of memory, paired with an Nvidia GeForce RTX 2080 Ti GPU and an Nvidia A40 with 24 GBÂ RAM.

Statistics

All statistical tests were performed either by MATLAB R2020a/R2022a (Mathworks), Python on Visual Studio Code 1.88.1 (Microsoft), Prism 10 (GraphPad) or Excel 2021 (Microsoft) with custom-written scripts. A two-tailed unpaired t-test was used to compare measures between two conditions. A two-tailed paired t-test or Wilcoxon signed-rank sum test was used to compare repeated measures between two conditions. A one-tailed paired t-test or Wilcoxon signed-rank sum test was used to test unidirectional changes of repeated measures between two conditions. A one-way repeated measures ANOVA was used to compare repeated measures across more than two conditions, followed by a post-hoc pairwise comparison with Tukeyâ€™s HSD test. A two-way repeated measures ANOVA was used to compare repeated measures over two factors of conditions, followed by a post-hoc pairwise comparison with Tukeyâ€™s HSD test. A test of no correlation (correlation t-test) was used for testing significance of correlation between different measures. All bar graphs and line plots show meanâ€‰Â±â€‰SEM. All the statistically significant differences with pâ€‰<â€‰0.05 are described in each figure with different significance levels indicated by asterisks as *pâ€‰<â€‰0.05, **pâ€‰<â€‰0.01, ***pâ€‰<â€‰0.001.

Reporting summary

Further information on research design is available in theÂ Nature Portfolio Reporting Summary linked to this article.

Data availability

This study did not generate new unique reagents. Raw behavioral and electrophysiological recording data generated in this study are available at https://neurodata.riken.jp/id/20241006-001 under the https://doi.org/10.60178/cbs.20241006-001⁵¹. Processed data shown in each figure are provided with this paper as a Source Data file.Â Source data are provided with this paper.

Code availability

The custom computer codes used in this study are available at https://github.com/BrainOdyssey2050/Makino-Wang-et-al-2024-Nat-Commun_Source-Code under the https://doi.org/10.5281/zenodo.13934383⁵².

References

Phelps, E. A. & LeDoux, J. E. Contributions of the amygdala to emotion processing: from animal models to human behavior. Neuron 48, 175â€“187 (2005).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Maren, S., Aharonov, G. & Fanselow, M. S. Retrograde abolition of conditional fear after excitotoxic lesions in the basolateral amygdala of rats: absence of a temporal gradient. Behav. Neurosci. 110, 718â€“726 (1996).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Gale, G. D. et al. Role of the basolateral amygdala in the storage of fear memories across the adult lifetime of rats. J. Neurosci. 24, 3810â€“3815 (2004).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Kitamura, T. et al. Engrams and circuits crucial for systems consolidation of a memory. Science 356, 73â€“78 (2017).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Liu, J., Totty, M. S., Melissari, L., Bayer, H. & Maren, S. Convergent coding of recent and remote fear memory in the basolateral amygdala. Biol. Psychiatry 91, 832â€“840 (2022).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Moscovitch, M., Cabeza, R., Winocur, G. & Nadel, L. Episodic memory and beyond: the hippocampus and neocortex in transformation. Annu. Rev. Psychol. 67, 105â€“134 (2016).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Goode, T. D., Tanaka, K. Z., Sahay, A. & McHugh, T. J. An integrated index: engrams, place cells, and hippocampal memory. Neuron 107, 805â€“820 (2020).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Goshen, I. et al. Dynamics of retrieval strategies for remote memories. Cell 147, 678â€“689 (2011).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Harris, A. Z. & Gordon, J. A. Long-range neural synchrony in behavior. Annu. Rev. Neurosci. 38, 171â€“194 (2015).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Bocchio, M., Nabavi, S. & Capogna, M. Synaptic plasticity, engrams, and network oscillations in amygdala circuits for storage and retrieval of emotional memories. Neuron 94, 731â€“743 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30, 3149â€“3157 (2017).
Lewis, C. M. et al. Recording quality is systematically related to electrode impedance. Adv. Healthc Mater. 13, 2303401 (2024).
Patel, J., Fujisawa, S., BerÃ©nyi, A., Royer, S. & BuzsÃ¡ki, G. Traveling theta waves along the entire septotemporal axis of the hippocampus. Neuron 75, 410â€“417 (2012).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Makino, Y., Polygalov, D., BolaÃ±os, F., Benucci, A. & McHugh, T. J. Physiological signature of memory age in the prefrontal-hippocampal circuit. Cell. Rep. 29, 3835â€“3846.e3835 (2019).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Lisman, J. & BuzsÃ¡ki, G. A neural coding scheme formed by the combined function of gamma and theta oscillations. Schizophr. Bull. 34, 974â€“980 (2008).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Benchenane, K., Tiesinga, P. H. & Battaglia, F. P. Oscillations in the prefrontal cortex: a gateway to memory and attention. Curr. Opin. Neurobiol. 21, 475â€“485 (2011).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Karalis, N. et al. 4-Hz oscillations synchronize prefrontal-amygdala circuits during fear behavior. Nat. Neurosci. 19, 605â€“612 (2016).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Grewe, B. F. et al. Neural ensemble dynamics underlying a long-term associative memory. Nature 543, 670â€“675 (2017).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Stujenske, J. M., Likhtik, E., Topiwala, M. A. & Gordon, J. A. Fear and safety engage competing patterns of theta-gamma coupling in the basolateral amygdala. Neuron 83, 919â€“933 (2014).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Tort, A. B., Komorowski, R., Eichenbaum, H. & Kopell, N. Measuring phase-amplitude coupling between neuronal oscillations of different frequencies. J. Neurophysiol. 104, 1195â€“1210 (2010).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Wirt, R. A. & Hyman, J. M. ACC theta improves hippocampal contextual processing during remote recall. Cell. Rep. 27, 2313â€“2327 e2314 (2019).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Adhikari, A., Sigurdsson, T., Topiwala, M. A. & Gordon, J. A. Cross-correlation of instantaneous amplitudes of field potential oscillations: a straightforward method to estimate the directionality and lag between brain areas. J. Neurosci. Methods 191, 191â€“200 (2010).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Engel, A. K. & Fries, P. Beta-band oscillationsâ€“signalling the status quo? Curr. Opin. Neurobiol. 20, 156â€“165 (2010).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Clarke-Williams, C. J. et al. Coordinating brain-distributed network activities in memory resistant to extinction. Cell 187, 409â€“427.e419 (2024).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Colgin, L. L. et al. Frequency of gamma oscillations routes flow of information in the hippocampus. Nature 462, 353â€“357 (2009).
ArticleÂ ADSÂ CASÂ PubMedÂ Google ScholarÂ
Altmann, A., ToloÅŸi, L., Sander, O. & Lengauer, T. Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340â€“1347 (2010).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Canolty, R. T. & Knight, R. T. The functional role of cross-frequency coupling. Trends. Cogn. Sci. 14, 506â€“515 (2010).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Lisman, J. E. & Jensen, O. The Î¸-Î³ neural code. Neuron 77, 1002â€“1016 (2013).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
LeDoux, J. E. Emotion circuits in the brain. Annu. Rev. Neurosci. 23, 155â€“184 (2000).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Scoville, W. B. & Milner, B. Loss of recent memory after bilateral hippocampal lesions. J. Neurol. Neurosurg. Psychiatry 20, 11â€“21 (1957).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Squire, L. R. & Alvarez, P. Retrograde amnesia and memory consolidation: a neurobiological perspective. Curr. Opin. Neurobiol. 5, 169â€“177 (1995).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Riedel, G. et al. Reversible neural inactivation reveals hippocampal participation in several memory processes. Nat. Neurosci. 2, 898â€“905 (1999).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Rosenbaum, R. S. et al. Remote spatial memory in an amnesic person with extensive bilateral hippocampal lesions. Nat. Neurosci. 3, 1044â€“1048 (2000).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Bartsch, T., Dohring, J., Rohr, A., Jansen, O. & Deuschl, G. CA1 neurons in the human hippocampus are critical for autobiographical memory, mental time travel, and autonoetic consciousness. Proc. Natl. Acad. Sci. USA. 108, 17562â€“17567 (2011).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Cambiaghi, M. et al. Higher-order sensory cortex drives basolateral amygdala activity during the recall of remote, but not recently learned fearful memories. J. Neurosci. 36, 1647â€“1659 (2016).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Headley, D. B., Kyriazi, P., Feng, F., Nair, S. S. & Pare, D. Gamma oscillations in the basolateral amygdala: localization, microcircuitry, and behavioral correlates. J. Neurosci. 41, 6087â€“6101 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Fitzgerald, P. J. et al. Durable fear memories require PSD-95. Mol. Psychiatry 20, 901â€“912 (2015).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Lo, Y., Yi, P. L., Hsiao, Y. T., Lee, T. Y. & Chang, F. C. A prolonged stress rat model recapitulates some PTSD-like changes in sleep and neuronal connectivity. Commun Biol 6, 716 (2023).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Narayanan, R. T. et al. Dissociated theta phase synchronization in amygdalo- hippocampal circuits during various stages of fear memory. Eur. J. Neurosci. 25, 1823â€“1831 (2007).
ArticleÂ PubMedÂ Google ScholarÂ
Narayanan, V. et al. Social defeat: impact on fear extinction and amygdala-prefrontal cortical theta synchrony in 5-HTT deficient mice. PLoS One 6, e22600 (2011).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Fitzgerald, P. J. et al. Prefrontal single-unit firing associated with deficient extinction in mice. Neurobiol. Learn. Mem. 113, 69â€“81 (2014).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281â€“1289 (2018).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy (2021).
Zhang, Z., Zhong, S.-h. & Liu, Y. TorchEEGEMO: A deep learning toolbox towards EEG-based emotion recognition. Expert Syst. Appl. 249, 123550 (2024).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. https://doi.org/10.48550/arXiv.1706.03762 (2017).
Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision, 618â€“626 (IEEE, 2017).
Chefer, H., Gur, S. & Wolf, L. Transformer interpretability beyond attention visualization. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 782â€“791 (IEEE, 2021).
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37â€“46 (1960).
ArticleÂ Google ScholarÂ
Fong, R., Patrick, M. & Vedaldi, A. Understanding deep networks via extremal perturbations and smooth masks. In Proc. IEEE/CVF International Conference on Computer Vision, 2950â€“2958 (IEEE, 2019).
Liu, S. et al. NLIZE: A perturbation-driven visual interrogation tool for analyzing and interpreting natural language inference models. IEEE Trans Vis Comput Graph, 651â€“660 (2018).
McHugh, T. J. Distinct Cross-Regional Control of Amygdalar Dynamics Reliably Reflects Fear Memory Age. https://neurodata.riken.jp/id/20241006-001 (2024).
Makino, Y., Wang, Y. & McHugh, T. J. BrainOdyssey2050/makino-wang-et-al-2024-nat-commun_source-code. Zenodo https://doi.org/10.5281/zenodo.13934383 (2024).

Download references

Acknowledgements

We thank all members of the Laboratory for Circuit and Behavioral Physiology (McHugh lab) at RIKEN Center for Brain Science for their support and the IT Services at Massey University for providing the A40 GPU server used for computational analyses. This work was supported by Grant-in-Aid for Scientific Research from MEXT (19H05646, 23H05478; T.J.M), RIKEN Center for Brain Science (T.J.M), and the RIKEN Special Postdoctoral Research Program (Y.M).

Author information

These authors contributed equally: Yuichi Makino, Yi Wang.

Authors and Affiliations

Laboratory for Circuit and Behavioral Physiology, RIKEN Center for Brain Science, Wako-shi, Saitama, Japan
Yuichi Makino,Â Yi WangÂ &Â Thomas J. McHugh
International Research Center for Neurointelligence, UTIAS, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
Yuichi Makino
School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
Yi Wang

Authors

Yuichi Makino
View author publications
You can also search for this author in PubMedÂ Google Scholar
Yi Wang
View author publications
You can also search for this author in PubMedÂ Google Scholar
Thomas J. McHugh
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

Y.M. and T.J.M. conceived the study and designed the experiments. Y.M. performed the experiments and analyzed data. Y.W. performed the computational modeling and analyzed data. Y.M., Y.W. and T.J.M. wrote the manuscript.

Corresponding authors

Correspondence to Yuichi Makino, Yi Wang or Thomas J. McHugh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisherâ€™s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâ€™s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâ€™s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Makino, Y., Wang, Y. & McHugh, T.J. Multi-regional control of amygdalar dynamics reliably reflects fear memory age. Nat Commun 15, 10283 (2024). https://doi.org/10.1038/s41467-024-54273-3

Download citation

Received: 02 May 2024
Accepted: 04 November 2024
Published: 09 December 2024
DOI: https://doi.org/10.1038/s41467-024-54273-3

Subjects

Abstract

Similar content being viewed by others

Neuronal types in the mouse amygdala and their transcriptional response to fear conditioning

An inhibitory hippocampalâ€“thalamic pathway modulates remote memory retrieval

Persistent transcriptional programmes are associated with remote memory

Introduction

Results

Gamma activity in the BLA is enhanced during recent memory recall

Entrainment of BLA gamma by theta activity of multiple regions is augmented during remote recall

BLA gamma intensity and its multi-regional theta-phase control differentially predict the onset of freezing for recent and remote recall

Both feature-based and feature-free computational models accurately predict memory age based on multi-regional LFP

Computational models predict memory age based on multi-regional oscillatory activity

Discussion

Methods

Experimental animals

Contextual fear conditioning apparatus

Microdrive construction

Surgery

Experimental procedure

Histology

Data acquisition

Behavioral analysis

Power (PSD) analysis

Cross-frequency coupling

Correlation of LFP instantaneous amplitude

Computational modeling by LightGBM

Input format

Model architecture

Model hyperparameters

Feature importance analysis

Cross-validation scheme

Performance metric

Computational modeling by Transformer

Input format

Model architecture

Model hyperparameters

Attention-guided feature importance analyses

[Stage 1]

[Stage 2]

[Stage 3]

Perturbation analysis

Cross-validation scheme

Hardware

Statistics

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links