RefSeq release 213 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of July 11, 2022, and contains 321,282,996 records, including 234,520,053 proteins, 45,781,716 RNAs, and sequences from 121,461 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Foreign contamination screening
A new foreign contamination screen (FCS) tool is coming soon–watch NCBI Insights for more details! We have been using the new FCS-GX tool to identify contamination in RefSeq eukaryote genomes. We then work with the genome submitters to suppress the contaminant sequences or upgrade to a better assembly for the species. To date, we have removed over 120 Mbp of contamination from RefSeq genomes, with more cleanup planned.
New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 30 species, including:
- Brown bear annotation release 102, based on new assembly UrsArc1.0 (GCF_023065955.1)
- Bank vole annotation release 100, based on new assembly Bank_vole1_10x (GCF_902806735.1) (pictured)
- Hawaiian crow annotation release 100, based on new assembly bCorHaw1.pri.cur (GCF_020740725.1)
- Loggerhead turtle annotation release 100, based on new assembly GSC_CCare_1.0 (GCF_023653815.1)
- Leguminivora glycinivorella annotation release 100, based on new assembly LegGlyc_1.1 (GCF_023078275.1)
- Castor bean annotation release 102, based on new assembly ASM1957865v1 (GCF_019578655.1)
Join our mailing list to keep up to date with RefSeq and other NCBI news.