RefSeq release 98 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of January 6, 2020, and contains 223,560,051 records, including 161,133,441 proteins, 29,134,515 RNAs, and sequences from 98,406 organisms.
The release is provided in several directories as a complete dataset and as divided by logical groupings.
Read on for several important announcements.
Matched Annotation by NCBI and EMBL-EBI (MANE)
MANE v0.8 covers 76% of human protein-coding genes, and is now available on FTP.
Useful MANE links:
- NCBI Insights | Matched Annotation by NCBI and EMBL-EBI (MANE): a new joint venture to define a set of representative transcripts for human protein-coding genes
- Matched Annotation from NCBI and EMBL-EBI (MANE) documentation
- NCBI Webinar: MANE- A New Collaboration Between NCBI and EMBL-EBI
Prokaryotic Genome Annotation Pipeline
A new version of the Prokaryotic Genome Annotation Pipeline (PGAP) is now available on GitHub. This release uses a new and improved version of tRNAscan (tRNAscan-SE:2.0.4) and includes our most up-to-date Hidden Markov Model and BlastRule
collections for naming proteins.
Want to know how to run PGAP on your own machine, compute farm, or in the cloud? Watch this NCBI Minute.
New download files and FTP directories for genome assemblies
You can now download new file types for species recently annotated by the NCBI Eukaryotic Genome Annotation Pipeline from the Assembly web pages and from the genomes/refseq FTP area. The new files types include alignments of annotated transcripts to the assembly in BAM format, all models predicted by Gnomon, and, for species that have been annotated multiple times, files characterizing the feature-by-feature differences between the current and the previous annotation.