Expanded accession formats appear in RefSeq release 93

RefSeq release 93 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 13, 2019. It contains 192,722,653 records, including 135,670,032 proteins, 25,840,272 RNAs, and sequences from 88,816 organisms.

The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Special announcements:

  1. SNP data removed from genome assembly records
    As previously announced, SNP variation features are no longer present in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records.
  2. New accession formats
    As previously announced, the new accessions are now beginning to appear in the dataset. This release contains 107 prokaryote records from 4 assemblies with accessions based on the new 6-letter WGS format (e.g., NZ_CAAAHS010000001). No eukaryote or viral RefSeq sequences contained in RefSeq release 93 include such accessions, but they will be included in future release files. Further information about the revised accession format and its effects on the LOCUS line are available in a previous blog post and in the GenBank release notes.

Leave a Reply