Tag: GenBank

NCBI Resources Highlighted in 2025 Nucleic Acids Research Database Issue

NCBI Resources Highlighted in 2025 Nucleic Acids Research Database Issue

The 2025 Nucleic Acids Research Database Issue features papers from NCBI staff on ClinVar, PubChem, GenBank, RefSeq, and more. The citations are available in PubMed with full-text available in PubMed Central (PMC). To read an article, click on the PMCID number listed below. 

Database resources of the National Center for Biotechnology Information in 2025

PMCID: PMC11701734

NCBI provides online information resources for biology, including the GenBank® nucleic acid sequence repository and the PubMed® repository of citations and abstracts published in life science journals. NCBI is currently developing the NIH Comparative Genomics Resource (CGR) to facilitate reliable comparative genomics analyses with an NCBI Toolkit and community collaboration.

Continue reading “NCBI Resources Highlighted in 2025 Nucleic Acids Research Database Issue”

Access Avian Influenza A (H5N1) Virus Sequences from the Current Outbreak at NCBI

Access Avian Influenza A (H5N1) Virus Sequences from the Current Outbreak at NCBI

The U.S. Centers for Disease Control and Prevention (CDC) has been monitoring the ongoing outbreak of the avian influenza A (H5N1) virus. This is widespread globally in wild birds, and has led to sporadic outbreaks in poultry, cows, several species of wild animals, and has been detected in exposed humans. The CDC recently sequenced the H5N1 virus in two respiratory specimens collected from a U.S. patient who was severely ill and has now died (PQ809549-PQ809564) 

As previously announced, the GenBank sequences, annotations, and metadata including from this patient are available through NLM’s NCBI resources.  Continue reading “Access Avian Influenza A (H5N1) Virus Sequences from the Current Outbreak at NCBI”

GenBank Release 264.0 Now Available!

GenBank Release 264.0 Now Available!

GenBank release 264.0 (12/19/2024) is now available on the NCBI FTP site. This release has 38.97 trillion bases and 5.36 billion records.

The current release has: 

  • 254,365,075 traditional records containing 5,085,904,976,338 base pairs of sequence data
  • 3,957,195,833 WGS records containing 32,983,029,087,303 base pairs of sequence data
  • 957,403,887 bulk-oriented TSA records containing 820,128,973,511 base pairs of sequence data
  • 187,349,466 bulk-oriented TLS records containing 77,038,271,475 base pairs of sequence data 

Continue reading “GenBank Release 264.0 Now Available!”

NCBI Taxonomy: Upcoming Changes to Viruses

NCBI Taxonomy: Upcoming Changes to Viruses

To reflect changes to the International Code of Virus Classification and Nomenclature (ICVCN) made by the International Committee on Taxonomy of Viruses (ICTV), NCBI will add binomial species names to about 3000 viruses. These updates to NCBI Taxonomy are planned for spring 2025, but you can view the changes now in the ICTV’s Virus Metadata Resource. 

We recognize that the former species names like Human immunodeficiency virus 1 (HIV-1) are broadly used in public health, educational institutions, and research. To minimize the impact of this change on those who use NCBI resources, we will add the new binomial species names (e.g. Lentivirus humimdef1) while keeping the former names available in the lineage for each species. The former names will move below the new binomial species name in the taxonomy hierarchy, ensuring continuity. Examples are provided below.   Continue reading “NCBI Taxonomy: Upcoming Changes to Viruses”

AGP Files Will No Longer be Accepted for Genome Submissions

AGP Files Will No Longer be Accepted for Genome Submissions

Effective March 2025 

Do you submit genomes to NCBI’s GenBank? Beginning March 2025, GenBank will no longer accept AGP files for genome submissions. Historically, AGP files were submitted along with contigs as necessary information for constructing assemblies. However, thanks to technology improvements, more and more whole genome shotgun (WGS) sequences submitted to NCBI are gapped assemblies (assemblies with inserted Ns for gaps).   Continue reading “AGP Files Will No Longer be Accepted for Genome Submissions”

Try Out a Development Version of NCBI’s Publicly Available Annotation Tool, EGAPx

Try Out a Development Version of NCBI’s Publicly Available Annotation Tool, EGAPx

Latest release now available 

Are you generating genomes for vertebrates, arthropods, or plants, and looking for a way to generate high-quality genome annotation? NCBI is working on a public version of the NCBI Eukaryotic Genome Annotation Pipeline (EGAPx), and the latest developmental release is now available for testing and feedback. Continue reading “Try Out a Development Version of NCBI’s Publicly Available Annotation Tool, EGAPx”

RefSeq Release 227 is Available!

RefSeq Release 227 is Available!

Check out RefSeq release 227, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

What’s included in this release?

As of November 4, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 497,549,107 records, including
  • 377,783,847 proteins
  • 66,987,567 RNAs
  • Sequences from 159,324 organisms 

Continue reading “RefSeq Release 227 is Available!”

GenBank Release 263.0 Now Available!

GenBank Release 263.0 Now Available!

GenBank release 263.0 (10/19/2024) is now available on the NCBI FTP site. This release has 36.50 trillion bases and 5.13 billion records.

The current release has: 

  • 251,998,350 traditional records containing 4,250,942,573,681 base pairs of sequence data
  • 3,745,772,758 WGS records containing 31,362,454,467,668 base pairs of sequence data
  • 948,733,596 bulk-oriented TSA records containing 812,661,461,811 base pairs of sequence data
  • 187,349,395 bulk-oriented TLS records containing 77,037,504,468 base pairs of sequence data 
Continue reading “GenBank Release 263.0 Now Available!”
Updated Genomes Terminology! “Representative Genome” is Replaced with “Reference Genome”

Updated Genomes Terminology! “Representative Genome” is Replaced with “Reference Genome”

NCBI is streamlining the terminology around our reference genomes. We currently have a small set of genomes collectively called representatives and an even smaller set called references. We have slowly converged on the term reference to refer to both sets.  

A genome is labeled reference if it is deemed to be the best available genome for the species based on assembly, annotation metrics (when available), and, in a small number of cases, curatorial review. The set of eukaryotic reference assemblies is updated continuously as new assemblies are submitted to GenBank. The set of prokaryotic references are recalculated three times a year.  

Important Note: Classification of “reference genome” is separate from inclusion in RefSeq – while genomes in RefSeq are preferentially used to pick the reference genome, a reference genome can also be chosen for species not included in RefSeq.   Continue reading “Updated Genomes Terminology! “Representative Genome” is Replaced with “Reference Genome””

Access Public Reports of Foreign Contamination Screen (FCS) Tool Results

Access Public Reports of Foreign Contamination Screen (FCS) Tool Results

Do you use genomes from NCBI and are concerned they may contain contaminant sequences? Now you can view reports generated for all prokaryotic and eukaryotic genomes with NCBI’s quality assurance tool, Foreign Contamination Screen (FCS), to better understand possible issues that may affect your studies.  

What reports are available? 
  • Summary reports to select better assemblies at thresholds of your choosing. 
  • Detailed reports to remove or mask contaminant sequences so they don’t adversely affect analyses. This is particularly useful for building k-mer databases. 
  • Individual assembly reports available through the FTP link located on NCBI Datasets genome pages.
  • Reports are available for all eukaryotic and prokaryotic GenBank and RefSeq assemblies, currently covering over 2.7 million assemblies. 
  • A README to understand how to interpret and use contamination reports. 

Continue reading “Access Public Reports of Foreign Contamination Screen (FCS) Tool Results”