Introduction

The biomedical research community has recognized many of the factors leading to irreproducibility of animal research1,2 and has begun to implement solutions to address this challenge: these cover many aspects, including further investment in standardized nomenclature3,4, improved experimental design (PREPARE5), reporting of animal research (ARRIVE, Animal Research: Reporting of In Vivo Experiments6) and open data sharing7. In this context, recommendations for a common standard of documentation for animal experiments6 serve a particularly important purpose.

The ARRIVE guidelines list the key aspects for documenting laboratory animals used in experiments: information on individuals, metadata, experimental procedures and study design6. The guidelines also recommend core documentation on animal model characteristics, such as genetic modification status, genotype, and manipulated gene(s), as well as genetic methods and technologies used to generate and validate the animals8. However, in addition to environmental factors (sanitary status, diet, etc.), the genotype–phenotype relationship can be markedly influenced by the genetic background of experimental animals9,10 and the reproducibility of the genotype–phenotype relationship is significantly impacted by breeding paradigms, source colony, and genetic drift11. Even differences in genetics, too often perceived as subtle, for example, the differences between C57BL/6 substrains in mice, can have a significant impact12. Therefore, a need remains for a more comprehensive description of the genetics of research animals to enable data interpretation and reproducibility. Such information is also needed to allow other scientists to acquire, maintain, and use experimental models for investigations building on published data. Here, we propose a framework of reporting guidelines, complementary to the ARRIVE guidelines, to support the documentation in scientific publications of the genetics of animals used in research. To be clear, this framework does not aim to impose standardization of animal genetics13, but, rather, to improve the documentation associated with animals used for scientific research. It is intended to be applicable to all animals used for research.

Our proposed framework applies to the full range of animal species used in life-science research and, in the case of genetically engineered animals, to those modified by either classical or current methods of genome engineering. Here, we discuss how this reporting framework is designed to document the genetic background and validation (defined here as the act of verifying) of animal models and to link this information to infrastructures that support the community in sharing data and materials. We also examine the fundamentals of genetic validation for animal models and we present the role of supporting frameworks for reporting animal genetics. We call these recommendations the Laboratory Animal Genetic Reporting (LAG-R) guidelines. By standardizing information, the LAG-R guidelines will improve the sharing and replicability of models across research teams and will guide peer reviewers in their assessment of the essential genetic information for animal models provided in manuscripts.

Reporting genetic backgrounds and genetic alterations

The limitations of current standards for comprehensive reporting of the genetics of animals in research publications have been the subject of many discussions, both informal and structured, among relevant research societies and consortia in recent years. The lack of standardization in reporting currently results in a deterioration of information regarding research animal models, particularly as they transition between different laboratories. This has many negative consequences, particularly if the animal model was not fully described in the initial publication and subsequent breeding records are partial or absent. This can prevent re-analysis of data, as fundamental and basic variables on research materials are not documented. It can also lead to misinterpretation of published studies. In addition, imprecise definition of genetic background may lead to the use of experimental animals with different genetics and phenotypes in subsequent studies, which is a known contributing factor in true or perceived irreproducibility of research14. This wastes significant research resources15, including those used to reconstitute missing information on both genetic background and genetic alterations2, as well as those used to re-establish genotyping assays16. The worst possible consequence can be the waste of experimental animals. By contrast, appropriate documentation and reporting will contribute to reduction and refinement practices in keeping with the 3Rs17 and will result in better management of animal use6.

The development and implementation of documentation and reporting standards of laboratory animal genetics present an opportunity to improve research reproducibility. Here, we propose two sets of features to be documented (Table 1): the first is applicable to all laboratory animals; the second, with additional criteria, is to document the genetic alterations of engineered and other genetic models.

Table 1 Minimal information needed for correct identification of species, lineages, and genetic alterations

The genetic background of animals can be described by species18,19, strain9,20, sub-strain12, breed/stock21 and breeding history (to trace contamination as an impact of the breeding scheme and genetic drift)22. Standards for their descriptions, such as the use of species-appropriate nomenclature conventions, are already defined. These are major intrinsic factors that have biological impact, and that must be fully documented to strengthen research reproducibility, complementing the ARRIVE guidelines. Table 1 summarizes the minimal information needed to correctly identify species and lineages. Criteria 1–5 report on information that is part of animal records in the laboratory. Criterion 6 reports the documentation of genetic assays that have recently become more accessible, both practically and financially, for many animal species. This last item validates the information provided in items 1–5. Examples of documentation are shown in Supplementary Table 1.

The genetic alterations carried by laboratory animals can affect phenotypes and require documentation. However, they are rarely fully documented, for both technical and historical reasons. Naturally occurring alleles or mutations obtained by chemical mutagenesis (e.g., ENU) require significant work to be defined23,24,25,26. Alleles obtained by gene targeting in embryonic stem (ES) cells are typically described by a schematic, with the full sequence of the targeting event rarely provided (with the particular exception of alleles that are produced by some high-throughput programs). Genome editing requires careful validation (including by sequencing) of the resulting allele (for example27), but documentation of the sequence of the entire region of interest is typically not provided. Furthermore, a consensus on the criteria for molecular validation of genetically engineered animals is yet to be defined. This is of particular importance because methods for both generation and validation of mutations are evolving rapidly, and new methods to validate larger regions of interest and identify both discrete and structural variations in genomes are emerging (for example, refs. 28,29,30). Of as much importance as the genetic background, the documentation of genetic alteration(s) and their validation represent essential areas for improvement in research reproducibility. The second part of Table 1 presents a set of criteria that should be used to describe genetic alterations in laboratory animals, which includes information about experimental design and confirmatory validation data. Examples of documentation for a genetically altered line and a checklist to support reviewers in assessing the documentation of genetics are shown in Supplementary Tables 1 and 2, respectively.

Fundamentals of genetic validation for animal models

Validation of genetics refers to the verification of the overall genome for all animal models and, in the case of genetically altered animals, to the characterization of a specific region of interest. In some instances, the initial step will be to ascertain the most accurate taxonomy of the model at hand. In other contexts, the aim will be to check the pedigree of the stock. Specifically, inbred and outbred models present different challenges, as inbreeding aims to maintain genetic stability, whereas outbreeding aims to maintain genetic diversity31. Variations can be discrete alterations or structural changes that are likely to accumulate over time, and which should be identified and/or managed whenever possible.

Genomic stability and quality

The genomes of animals change over their lifetime32 and with each generation33,34: both the accumulation of natural mutations (with fixation of variations resulting in genetic drift) and the modification of allelic diversity as a result of crossing patterns (inbreeding for outbred lines or contamination of lines by other backgrounds) could affect the genome composition of a laboratory animal31. For example, MMRRC UNC has performed a preliminary analysis of 230 lines, and they have estimated that approximately 40% of these lines do not match their name. The most common discrepancies are lack of congenicity, inbreeding, or the presence of additional genetic backgrounds (information from F.P.M.V., MMRRC). In addition, contaminating transgenes or unexpected altered alleles are also observed but at a low frequency. Good documentation and breeding strategies play an essential role in managing the quality of the genetic background of laboratory animals35,36 (Table 1), but additional techniques for capturing genetic variation are becoming available and affordable for many animal models (Supplementary Table 3). In particular, the Mouse Universal Genotyping Array (MUGA) panels can be used to verify the presence of many commonly used constructs, as well as to corroborate the composition of the genetic background of mice22. Similar panels are available for other species37,38. Similarly, quantitative polymerase chain reaction (qPCR) or digital polymerase chain reaction (dPCR) assays can be designed to detect common constructs used in a field of research or in a particular species. With the development of new technologies to evaluate genetic quality, a full genome assessment (including an understanding of the frequency of both discrete and structural variations) will become more accessible. Different laboratory animal species and scientific questions call for a different depth of genetic validation. However, in all cases, the more complete the validation of the genetics of the animal models, the more reliable and reproducible the experiments will be. Advances in the understanding and documentation of genetic variation do not prevent the occurrence of genetic changes during animal breeding but they allow researchers to monitor such changes and to re-evaluate phenotypes with the knowledge of newly described causative or modifying variants39. However, although genetic control is complementary to good practices in animal colony management, it has limited power when downstream crosses are made without rigor. Finally, many other factors affect the reproducibility of a given experiment, and a number of these are already covered in the ARRIVE guidelines6.

For a small number of laboratory animal species, advanced frameworks designed to manage their genetics already exist, mainly as a result of the length of time and the context in which they have been studied. These frameworks include knowledge of the species’ sequences and pedigrees, and support structures dedicated to maintaining genetic integrity. In those cases, reporting, traceability, and control of the genetic background of animals are even more essential. Where possible, crossing with wild-type reference animals is good practice to reduce de novo variations and construct contaminations within the genome. For example, current practice in mice is to backcross for two or three generations, depending on the level of inbreeding (https://www.jax.org/news-and-insights/jax-blog/2018/April/how-to-refresh-your-mutant-or-transgenic-mouse-strains40). Likewise, to avoid inbreeding depression in zebrafish stocks, each new generation should be produced by an outcross, and crosses between siblings should be performed only when absolutely necessary41,42,43. Although practices may vary across research communities and as fields evolve, the traceability of breeding patterns remains important in all circumstances.

Furthermore, all genome engineering techniques have the potential to introduce additional genetic changes while modifying the region of interest: both chromosome number and structure vary in embryonic stem (ES) cells when cultured44; both gene targeting and genome editing have the potential to insert additional copies of the donor template away from the target45; random insertion of DNA (transgenics) can introduce structural variation at the site of insertion46; and nucleases used in genome engineering activities are not entirely specific and can cause off-target variation47, though this is rare when specific design practices are used and must be evaluated in the context of known variation in the animals being studied48. Techniques have also been developed to identify these unwanted events49,50. These are discussed in the context of genome engineering validation in the next section.

Assessing the genetic alterations

Genetic engineering was once restricted to a limited number of laboratory animal species33, but as a result of advances in genome-editing techniques, there is now almost no limit to the species that can be genetically engineered. Standards for validation are evolving in parallel with the technology.

Different modes of alteration have differing potential for unwanted outcomes. For example, whereas random insertions are the aim of additive transgenesis, other engineering technologies aim to target a specific locus. Therefore, no universal recommendation can be made for the validation of a genetic modification. When a specific locus is targeted, validation should aim to characterize the genetic change at the region of interest. In all cases, genetic changes resulting from the engineering method should be assessed throughout the genome. As for maintaining genomic stability and quality, multiple crosses to the reference genetic background will mitigate the potential genome-wide impact of genetic engineering and should be reported (Table 1). For targeted events, both the sequence of the region of interest and the local structural integrity should be examined, the latter for exclusion of deletion, duplication, and inversion events. Supplementary Table 4 lists the various molecular assays that can be employed to interrogate these two aspects. Ideally, genetic quality would be regularly assayed, but importing or onboarding animals is the most important point at which to check the quality of newly acquired or generated models. A combination of methods that elucidate both the sequence and structure of the target locus and the genetic makeup of samples (Mendelian, mosaic, or chimeric animals) is required. The methods used will depend on a number of factors, such as the laboratory setup, whether a donor sequence was used in the mutagenesis process and the length of the modified segment. For example, point mutations are easily characterized using Sanger or next-generation short-read (NGS) sequencing, whereas verification of very large knock-ins is likely to require a long-read-based sequencing approach. The functional characterization of the products of mutated genes is also an important aspect of research quality but is outside the scope of these recommendations.

Further genome validation following genome engineering

All mutagenesis techniques have the potential to generate unpredictable and/or additional changes in the genome, outside of any region of interest, and these can be transmitted through generations. These may be discrete mutations51, additions, insertions52, or structural rearrangements (including chromothripsis53,54), in addition to naturally occurring genetic variation, as discussed above. It is essential to be aware of the occurrence of these nonconforming events to ensure the correct interpretation of results and research quality. A number of technologies and simple assays can be employed to screen animals for the presence of off-target events (including random integrations, which can be detected by dPCR), and these are summarized in Supplementary Table 5. However, some molecular changes have no recognizable pattern, meaning that no specific genotyping assay can be designed, or may affect difficult-to-characterize features, such as large segments or repeated sequences. Changes of this type will, therefore, require more sophisticated investigations, such as next-generation sequencing (Supplementary Table 5). No single technology yet allows for the unbiased and complete acquisition of the sequence of a whole genome47,55.

The role of supporting frameworks for reporting animal genetics

There are numerous supporting frameworks for standardization initiatives that facilitate the knowledge and management of the genetic quality of laboratory animals. These include nomenclature guidelines, as well as repositories of information (such as ontologies, research data, metadata, and annotations) and materials.

Advanced systems of nomenclature are continuously being developed to describe animals and genes in a standardized fashion. Taxonomy resources include the National Center for Biotechnology Information taxonomy database56. In addition, the Vertebrate Gene Nomenclature Committee assigns standardized names to genes in vertebrate species that currently lack a nomenclature committee, ensuring that genes are named in line with their human orthologs57,58. These resources are essential to develop a common and unambiguous vocabulary with which to name genetic models and characteristics. They support the continuous refinement of nomenclature systems in sync with the evolution of animal models and molecular tools so that nomenclature remains compatible with state-of-the-art research.

The use of most laboratory models is supported by dedicated databases that aggregate genetic and phenotypic information and that link to other resources, such as sequencing databases, scientific publications, and animal model repositories (see examples in Supplementary Table 6). Researchers have a role to play in registering new animal models to publicly accessible databases, thus helping to avoid the generation of lines that already exist. Commercial breeders also distribute information on the biology of the animals they produce. Additional information with a focus on animal welfare can be collected through the establishment of identity cards59.

The integrity of genetic model materials is preserved through repositories that archive and distribute animals, gametes, and embryos, as well as plasmids and ES cells. These support structures are federated in international networks that collaborate to ensure the availability of quality-controlled materials to the research community worldwide. The collections available in these repositories can be interrogated at their individual web portals or through web pages that allow querying of the entire repository network to locate and source animal models (https://www.alliancegenome.org/4). Together with academic and commercial research animal breeders, these repositories play a crucial role in ensuring the genetic quality and stability of laboratory animals and the reproducibility of research that employs animal models.

Acquiring knowledge of appropriate standards of documentation, with the ability to understand and employ these, is an integral part of scientific training. This includes a knowledge of genetics. Beyond formal education, many web resources and training opportunities are available (e.g., https://oacu.oir.nih.gov/training-resources; https://www.aalas.org/iacuc/iacuc_resources/training-programs; https://resources.jax.org/). In this respect, learned societies, breeders, and repositories of laboratory animals are important sources of information and educational material.

Finally, the FAIRsharing portal aims to aggregate the resources that support standardization in the life sciences (https://fairsharing.org7). Likewise, learned societies and dedicated consortia play an essential role in establishing these research-support frameworks and in facilitating the training of researchers to understand and manage the challenges of using laboratory animals for reproducible research.

Perspectives

Recognizing concerns about reproducibility, the LAG-R guidelines aim to standardize the information about animal models in scientific reports. This is becoming increasingly important as the diversity of laboratory animals expands, along with new methods and designs for the generation of genetic alterations and for in-depth characterization of genomes. However, we must not ignore that there are barriers to overcome. In particular, it requires a consensus within the community, greater expertise in genetics, and additional editorial work on the part of authors, reviewers, and editors.

It still does not seem realistic, or even possible, to fully validate the entire genetic composition of every animal used in research. On the other hand, improved reporting of all available information regarding the genetic makeup of laboratory animal models and on which validations have been carried out will allow us to better reinforce and evaluate the reliability of animal experiments. More in-depth animal model validation is increasingly feasible but requires specific expertise and the availability of dedicated funding, two aspects that will require significant investment.

Going forward, it is for the community to improve laboratory animal genetic reporting and the LAG-R guidelines will help to facilitate this, but only with the commitment of scientists, funding bodies, journals, reviewers and editors.