Need a refresher of what NCBI offers? Or just feel you aren’t taking full advantage of NCBI resources? Check out some of NCBI’s most recent recordings of NCBI Minute webinars up on the NCBI YouTube channel.
RefSeq release 86 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of January 8, 2018 and contains 149,493,466 records, including 102,133,844 proteins, 21,370,778 RNAs, and sequences from 75,218 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.
Two important notes follow; please see the RefSeq release notes for more information.
Non-human SNP data dropped
Non-human SNPs were dropped from all RefSeq FTP files in the daily FTP files starting in December 2017, and in this full release (January 2018).
HPRD features removed
We have dropped a set of features, originally imported from HPRD, from human transcript and protein RefSeq records.
BLAST is a powerful search tool, but often a search is just the beginning of the journey. We put ourselves in the shoes of a researcher who has just sequenced a handful of samples from the latest viral outbreak and tried to understand what information would be most useful. We also reached out to researchers in the field and asked: a) what questions do they really want to answer? and b) how can NCBI best provide the answers? Based on insights from those questions and answers, we developed the new Virus Sequence Search Interface (Fig. 1). The Search Interface is an NCBI Labs project, which means it is an experimental project, and we may modify the resource based on your feedback and experiences.
GenBank release 223.0 (12/15/2017) has 206,293,625 traditional records (including non-bulk-oriented TSA) containing 249,722,163,594 base pairs of sequence data. In addition, there are 551,063,065 WGS records containing 2,466,098,053,327 base pairs of sequence data, 201,559,502 TSA records containing 181,394,660,188 base pairs of sequence data, and 12,695,198 TLS records containing 4,458,042,616 base pairs of sequence data.
- Amphiprion ocellaris (clown anemonefish)
- Centruroides sculpturatus (bark scorpion)
- Ceratitis capitata (Mediterranean fruit fly)
- Cucurbita maxima (winter squash)
- Cucurbita moschata (crookneck pumpkin)
- Drosophila hydei (fly)
- Drosophila willistoni (fly)
- Felis catus (domestic cat)
- Leptinotarsa decemlineata (Colorado potato beetle)
- Maylandia zebra (zebra mbuna)
- Olea europaea sylvestris (wild olive)
- Onthophagus taurus (beetle)
- Piliocolobus tephrosceles (Ugandan red Colobus)
- Seriola lalandi dorsalis (yellowtail amberjack)
- Spodoptera litura (moth)
- Xiphophorus maculatus (southern platyfish)
- Zea mays (maize)
See more details on the Eukaryotic RefSeq Genome Annotation Status page.