RefSeq release 85 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of November 6, 2017, and contains 146,710,309 records, including 100,043,962 proteins, 20,905,608 RNAs, and sequences from 73,996 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.
As previously announced, GI numbers were removed from report files on the FTP site in this release.
Phasing out support for non-human organisms
NCBI is phasing out support for non-human organisms in the dbSNP and dbVar databases. As of September 1, 2017, these databases have stopped accepting submissions for non-human organisms. The interactive websites for these databases and related NCBI services, including RefSeq flatfiles, will stop presenting non-human variant data in November 2017.
We have elected to provide non-human SNPs in this RefSeq FTP release, but they will be dropped from all RefSeq FTP files in the daily FTP files starting in December 2017, and in the next full release in January 2018.
Starting in March 2018, SNP variation data will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records. Because the number of variants is already enormous and still growing, removing SNP features from these large genomic records will significantly reduce the size of RefSeq FTP files and make downloading and processing easier. We will continue to include SNPs on NG_-prefixed genomic records, and transcript (NM_, NR_, XM_, XR_) and protein (NP_, XP_, YP_) sequences.
In addition, the ASN.1 format will be changed to:
- remove the bitfield
- remove the ‘extra’ flags