This full release incorporates genomic, transcript, and protein data available as of July 12, 2021, and contains 285,425,070 records, including 209,035,492 proteins, 39,039,901 RNAs, and sequences from 112,462 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 22 species, including
- Sheep annotation release 104, based on new assembly ARS-UI_Ramb_v2.0 (GCF_016772045.1)
- Black-legged tick annotation release 103, based on new assembly ASM1692078v2 (GCF_016920785.2)
- Arctic fox annotation release 100, based on assembly ASM1834538v1 (GCF_018345385.1)
- Mariana crow annotation release 100, based on assembly C.kubaryi_AGA036_p1.0 (GCF_017639235.1)
- Elephant shark annotation release 101, based on new assembly IMCB_Cmil_1.0 (GCF_018977255.1)
Re-annotation of RefSeq genome assemblies for E. coli and four other species
We have re-annotated all RefSeq genomes for Escherichia coli, Mycobacterium tuberculosis, Bacillus subtilis, Acinetobacter pittii, and Campylobacter jejuni using the most recent release of PGAP. You will find that more genes now have gene symbols (e.g. recA).
Introducing the new NCBI Datasets Genomes page
The updated NCBI Datasets Genomes page now has genome data for all domains of life, including bacterial and viral genomes.