Expanded accession formats appear in RefSeq release 93


RefSeq release 93 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 13, 2019. It contains 192,722,653 records, including 135,670,032 proteins, 25,840,272 RNAs, and sequences from 88,816 organisms.

The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Special announcements:

  1. SNP data removed from genome assembly records
    As previously announced, SNP variation features are no longer present in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records.
  2. New accession formats
    As previously announced, the new accessions are now beginning to appear in the dataset. This release contains 107 prokaryote records from 4 assemblies with accessions based on the new 6-letter WGS format (e.g., NZ_CAAAHS010000001). No eukaryote or viral RefSeq sequences contained in RefSeq release 93 include such accessions, but they will be included in future release files. Further information about the revised accession format and its effects on the LOCUS line are available in a previous blog post and in the GenBank release notes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s