This full release incorporates genomic, transcript, and protein data available as of March 13, 2019. It contains 192,722,653 records, including 135,670,032 proteins, 25,840,272 RNAs, and sequences from 88,816 organisms.
The release is provided in several directories as a complete dataset and also as divided by logical groupings.
- SNP data removed from genome assembly records
As previously announced, SNP variation features are no longer present in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records.
- New accession formats
As previously announced, the new accessions are now beginning to appear in the dataset. This release contains 107 prokaryote records from 4 assemblies with accessions based on the new 6-letter WGS format (e.g., NZ_CAAAHS010000001). No eukaryote or viral RefSeq sequences contained in RefSeq release 93 include such accessions, but they will be included in future release files. Further information about the revised accession format and its effects on the LOCUS line are available in a previous blog post and in the GenBank release notes.