New download files and FTP directories for genome assemblies


You can now download new file types for species recently annotated by the NCBI Eukaryotic Genome Annotation Pipeline from the Assembly web pages and from the genomes/refseq FTP area. The new files types include alignments of annotated transcripts to the assembly in BAM format, all models predicted by Gnomon, and — for species that have been annotated multiple times —  files characterizing the feature-by-feature differences between the current and the previous annotation.

Changes to the Assembly downloads

The new options are in the Assembly download menu  that appears on search results and record displays (Figure 1).

Assembly_downloadFigure 1. The Assembly page for the representative genome for the western lowland gorilla (GCF_008122165.1). Clicking the the blue “Download Assembly” button allows you to select file(s) to download. The new types of files are boxed in red. For example select “RefSeq transcript alignments” to download these in BAM format.

Changes to the genomes FTP directories

You can access the newly created annotation release (AR) directories on the FTP site under genomes/refseq. These directories have the  following structure:

refseq/organism_group/genus_species/annotation_releases

For example,  you will find the most recent annotation release (AR 103) of the northern pike (Esox lucius) in

https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_other/Esox_lucius/annotation_releases/current

For each organism, the annotation release identifiers are numbered sequentially starting at 100 and are independent of the assembly used.

We are adding the files for the latest annotation releases of the 500+ species annotated by the NCBI Eukaryotic Genome Annotation Pipeline to genomes/refseq. On February 1, 2020, we will stop publishing annotation results to the older genus_species directories immediately under genomes.  After that time, you can get files for newly released annotations from the genomes/refseq/ directories as described above and from the genomes/all/annotation_releases/ directories where we have organized the data by taxonomy id.  Beginning February 1, 2020, we will move the genomes/genus_species directories to genomes/archive/old_refseq/.

Start your one-stop shopping experience for RefSeq assemblies and annotation release data today through the Assembly resource or the genomes/refseq FTP area!

2 thoughts on “New download files and FTP directories for genome assemblies

  1. Pingback: Recent improvements to the genome Assembly resource | NCBI Insights

  2. Pingback: RefSeq Release 98 is public | NCBI Insights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s