You can now download new file types for species recently annotated by the NCBI Eukaryotic Genome Annotation Pipeline from the Assembly web pages and from the genomes/refseq FTP area. The new files types include alignments of annotated transcripts to the assembly in BAM format, all models predicted by Gnomon, and — for species that have been annotated multiple times — files characterizing the feature-by-feature differences between the current and the previous annotation.
Changes to the Assembly downloads
The new options are in the Assembly download menu that appears on search results and record displays (Figure 1).
Figure 1. The Assembly page for the representative genome for the western lowland gorilla (GCF_008122165.1). Clicking the the blue “Download Assembly” button allows you to select file(s) to download. The new types of files are boxed in red. For example select “RefSeq transcript alignments” to download these in BAM format.
Changes to the genomes FTP directories
You can access the newly created annotation release (AR) directories on the FTP site under genomes/refseq. These directories have the following structure:
For example, you will find the most recent annotation release (AR 103) of the northern pike (Esox lucius) in
For each organism, the annotation release identifiers are numbered sequentially starting at 100 and are independent of the assembly used.
We are adding the files for the latest annotation releases of the 500+ species annotated by the NCBI Eukaryotic Genome Annotation Pipeline to genomes/refseq. On February 1, 2020, we will stop publishing annotation results to the older genus_species directories immediately under genomes. After that time, you can get files for newly released annotations from the genomes/refseq/ directories as described above and from the genomes/all/annotation_releases/ directories where we have organized the data by taxonomy id. Beginning February 1, 2020, we will move the genomes/genus_species directories to genomes/archive/old_refseq/.