As previously announced, we updated the ClinVar website as part of our effort to better support the display of submitted somatic variation data.
HomoloGene Now Redirects to NCBI Datasets Gene
A new way to view and download related genes
As previously announced, HomoloGene now automatically redirects to the NCBI Datasets Gene page giving you easy access to up-to-date sequence and homology data. The NCBI Datasets Gene Table provides a link to NCBI Orthologs with expanded gene and protein information and links to tools. NCBI Orthologs includes more genes and sequences for a growing range of taxa. See an example below. Legacy HomoloGene data remains available on the FTP site. Continue reading “HomoloGene Now Redirects to NCBI Datasets Gene”
BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024
Easily generate BLAST FASTA files yourself!
In April 2024, the FASTA (sequence text) files of the sequences in the Basic Alignment Search Tool (BLAST) databases will no longer be available on the FTP site. However, you can easily generate FASTA files yourself from the formatted BLAST databases by using the BLAST utility blastdbcmd that comes with the standalone BLAST programs. This provides you the flexibility to generate organism-specific FASTA files using NCBI’s taxonomy IDs for specific organisms or groups.
See the examples below and the BLAST Command Line Applications User Manual for more details on the standalone BLAST programs and working with the BLAST databases. Continue reading “BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024”
Updated Bacterial and Archaeal Reference Genome Collection is Available!
Download the updated bacterial and archaeal reference genome collection! This collection (18,941 genomes as of Jan 18, 2024) was built by selecting the “best” genome assembly for each species among the 330,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). You can speed up your sequence searches by running them against these high-quality genomes instead of the entire nucleotide or protein database.
The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation. Continue reading “Updated Bacterial and Archaeal Reference Genome Collection is Available!”
RefSeq Release 222 Now Available!
Check out RefSeq release 222, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.
What’s included in this release?
As of January 8, 2024, this full release incorporates genomic, transcript, and protein data containing:
- 411,137,832 records
- 304,562,770 proteins
- 59,343,570 RNAs
- sequences from 145,371 organisms
Now Available: NCBI Hidden Markov Models (HMM) Release 14.0!
Download release 14.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP)! Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package. Continue reading “Now Available: NCBI Hidden Markov Models (HMM) Release 14.0!”
Best of 2023: A Look at the NCBI Insights Blog
As we begin a new year, let’s look back at the best NCBI Insights Blog posts of 2023.
In case you missed any of these, check them out! Continue reading “Best of 2023: A Look at the NCBI Insights Blog”
GenBank Release 259.0 is Available!
GenBank release 259.0 (12/22/2023) is now available on the NCBI FTP site. This release has 27.94 trillion bases and 3.96 billion records.
The current release has:
- 247,777,761 traditional records containing 2,433,391,164,875 base pairs of sequence data
- 2,775,205,599 WGS records containing 23,600,199,887,231 base pairs of sequence data
- 701,336,089 bulk-oriented TSA records containing 659,924,904,311 base pairs of sequence data
- 130,654,568 bulk-oriented TLS records containing 50,868,407,906 base pairs of sequence data
Using NCBI Data and Tools for Your Research Project
Are you a biology student working on a research project? NCBI offers free access to a wide variety of resources and tools to help you find and download data for your project.
How and why do you use our resources? Check out the example below:
Your professor has assigned you a research project looking at the sequence and structure of the TP53 gene in the domestic cat (Felis catus). In addition, you were asked to find information on this gene and its genomic region in other members of the cat family (Felidae). Continue reading “Using NCBI Data and Tools for Your Research Project”
Update to GenBank Qualifier
‘Country’ will transition to ‘Geographic Location’ effective June 2024
As announced earlier this year, we will begin to systematically gather ‘location of collection’ and ‘date and time of collection’ for sequence data submitted to GenBank and the Sequence Read Archive (SRA).
As part of this effort and to make location data more accurate and informative, we are also changing the way this information is represented on GenBank records, consistent with the relevant field in BioSample. Continue reading “Update to GenBank Qualifier”








