This full release incorporates genomic, transcript, and protein data available as of September 12, 2022, and contains 328,588,569 records, including 239,609,016 proteins, 47,387,931 RNAs, and sequences from 123,394 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Foreign contamination screening
Introducing the new Foreign Contamination Screen (FCS) tool! If you produce assembled genomes, check out FCS, a tool you can run yourself to improve your genome assemblies and facilitate high-quality data submissions to GenBank. FCS is part of the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms. See our previous blog post to learn how FCS enhances contaminant detection sensitivity.
New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 43 species, including:
- Asiatic elephant annotation release 100, based on new assembly mEleMax1_primary_haplotype (GCF_024166365.1) (pictured)
- Gray wolf (dingo) annotation release 102, based on updated assembly ASM325472v2 (GCF_003254725.2)
- Reed vole annotation release 100, based on updated assembly M_Fortis_MF-2015_v1.1 (GCF_014885135.2)
- Mexican tetra annotation release 103, based on new assembly AstMex3_surface (GCF_023375975.1)
- Quercus robur (English oak) annotation release 100, based on new assembly dhQueRobu3.1 (GCF_932294415.1)
- Schistocerca gregaria (desert locust) annotation release 100, based on new assembly iqSchGreg1.2 (GCF_023897955.1)
- Boll weevil annotation release 100, based on new assembly icAntGran1.3 (GCF_022605725.1)
Join our mailing list to keep up to date with RefSeq and other CGR news