NCBI’s Reference Sequence (RefSeq) FTP release numbers will increment to 200 for the next release and skip over the numbers 100-199. The current, March 2020 release, is release 99. The next bi-monthly release in May 2020 will be release 200. This change is to avoid overlapping with the release numbers of the completely independent RefSeq annotation releases for the eukaryotic genomes we annotate, which are currently in the range 100-109, for example Mus musculus Annotation Release 108. Continue reading “The next RefSeq FTP release number will skip to 200”
Tag: downloading data
We have added the latest NCBI Eukaryotic Genome Annotation Pipeline results for the more than 580 species that we annotate to the genomes/refseq directory on the genomes FTP area. As we announced in December, we will stop publishing annotation results to the genus_species directories (example: genomes/Xenopus_tropicalis) on the genomes FTP site effective February 1, 2020. We will also move existing genus_species directories to genomes/archive/old_refseq during the month of February.Figure 1. The Assembly page for the Xenopus tropicalis UCB Xtro 10.0 (GCF_000004195.4) showing the blue download button. Annotation results such as the RefSeq transcript alignments that can be downloaded from the web page are now also under the genomes/refseq directory on the FTP site. The FTP path to the .bam alignment files is in red.
These FTP changes do not affect the Assembly download function. As always, you can download assembly data using the blue Download button on the web pages (Figure 1).
You can now download new file types for species recently annotated by the NCBI Eukaryotic Genome Annotation Pipeline from the Assembly web pages and from the genomes/refseq FTP area. The new files types include alignments of annotated transcripts to the assembly in BAM format, all models predicted by Gnomon, and — for species that have been annotated multiple times — files characterizing the feature-by-feature differences between the current and the previous annotation.
If you download data from the SRA (Sequence Read Archive) FTP site, we would encourage you to try the SRA Toolkit. This is particularly true if you use the SRA Fuse/FTP site at ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant, which the SRA team will decommission on December 1, 2019.
The SRA Toolkit offers several advantages for downloading SRA data, including greater flexibility in specifying the data you need as well as access to public SRA data in the cloud. If you’re new to the Toolkit, you may want to start with these instructions.
If you have any questions or concerns about downloading SRA data, please contact firstname.lastname@example.org. We’d love to hear from you!
On Wednesday, November 15, 2017, at 12:00 PM EST, NCBI will present a webinar on advanced applications of the NCBI APIs we previously introduced in our general API webinar in September. This webinar is intended for bioinformaticians, computational biologists and others who are already comfortable with writing scripts to access, download and analyze data.
This blog post is directed toward PubMed users.
Did you know you can download the entire PubMed database, and keep this dataset current with our daily update files? These data are available for free from our FTP site and no longer require a license agreement, whether you’re interested in text mining, or want to create your own database for searching and analytics.
Each year in December, NLM releases a comprehensive (baseline) set of citation records in XML format for download. Every day, incremental update files are made available and include new, revised and deleted citations. Please see the README.txt file for more information and contact email@example.com with questions.
This blog post is directed toward Assembly users.
A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.
For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.
Continue reading “Genome data download made easy!”