Important changes to the genomes FTP site in February

We have added the latest NCBI Eukaryotic Genome Annotation Pipeline results for the more than 580 species that we annotate to the genomes/refseq directory on the genomes FTP area. As we announced in December, we will stop publishing annotation results to the genus_species directories (example: genomes/Xenopus_tropicalis) on the genomes FTP site effective February 1, 2020. We will also move existing genus_species directories to genomes/archive/old_refseq during the month of February.X_t_assemblyFigure 1. The Assembly page for the Xenopus tropicalis UCB Xtro 10.0 (GCF_000004195.4) showing the blue download button. Annotation results such as the RefSeq transcript alignments that can be downloaded from the web page are now also under the genomes/refseq directory on the FTP site. The FTP path to the .bam alignment files is in red.

These FTP changes do not affect the Assembly download function. As always, you can download assembly data using the blue Download button on the web pages (Figure 1).

 

New download files and FTP directories for genome assemblies

You can now download new file types for species recently annotated by the NCBI Eukaryotic Genome Annotation Pipeline from the Assembly web pages and from the genomes/refseq FTP area. The new files types include alignments of annotated transcripts to the assembly in BAM format, all models predicted by Gnomon, and — for species that have been annotated multiple times —  files characterizing the feature-by-feature differences between the current and the previous annotation.

Continue reading

Users of the SRA FTP site: Try the SRA Toolkit!

If you download data from the SRA (Sequence Read Archive) FTP site, we would encourage you to try the SRA Toolkit. This is particularly true if you use the SRA Fuse/FTP site at ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant, which the SRA team will decommission on December 1, 2019.

The SRA Toolkit offers several advantages for downloading SRA data, including greater flexibility in specifying the data you need as well as access to public SRA data in the cloud. If you’re new to the Toolkit, you may want to start with these instructions.

If you have any questions or concerns about downloading SRA data, please contact sra@ncbi.nlm.nih.gov. We’d love to hear from you!

PubMed is now available for download without a license and can be updated every day!

This blog post is directed toward PubMed users.

Did you know you can download the entire PubMed database, and keep this dataset current with our daily update files? These data are available for free from our FTP site and no longer require a license agreement,  whether you’re interested in text mining, or want to create your own database for searching and analytics.

Each year in December, NLM releases a comprehensive (baseline) set of citation records in XML format for download. Every day, incremental update files are made available and include new, revised and deleted citations. Please see the README.txt file for more information and contact info@ncbi.nlm.nih.gov with questions.

Genome data download made easy!

This blog post is directed toward Assembly users.

A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.

For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.
Continue reading