This full release incorporates genomic, transcript, and protein data available, as of May 13, 2019 and contains 200,311,267 records, including 141,839,334 proteins, 26,534,602 RNAs, and sequences from 91,873 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
In October last year, we announced the launch of an exciting new collaboration between NCBI and EMBL-EBI called MANE (Matched Annotation from the NCBI and EMBL-EBI). As a first step, we began generating the MANE Select set, comprising a matched representative transcript for every human protein-coding gene. Now that our genome resources are integrated into a high-quality transcript set, you don’t need to choose between RefSeq and Ensembl/GENCODE datasets for genomic analyses.
Not only does the MANE Select set make it easier for you to exchange data or translate coordinates between RefSeq and Ensembl annotation results, but you’ll also be able to use the set with NGS-based sequencing technologies and other resources that use the latest and highest-quality reference human genome assembly available.