RefSeq release 203 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of November 2, 2020, and contains 256,340,911 records, including 186,482,096 proteins, 34,176,314 RNAs, and sequences from 105,349 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
RefSeq annotation of mouse GRCm39
RefSeq has finished its initial annotation of the new mouse reference assembly, GRCm39, recently released by the Genome Reference Consortium. This is the first coordinate-changing update to the mouse reference since the 2012 release of GRCm38, resolving over 400 issues, almost doubling the scaffold N50, closing almost half the gaps, and adding 1.9 Mb of sequence.
The annotation report for annotation release 109 is available here.
The annotation products are available in the sequence databases and on the FTP site.
New eukaryotic genome annotations
In addition to mouse (GRCm39), this release contains new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 27 species, including:
- Pallas’s mastiff bat annotation release 100, based on the assembly mMolMol1.p (GCF_014108415.1)
- Myotis myotis bat annotation release 100, based on the assembly mMyoMyo1.p (GCF_014108235.1)
- southern grasshopper mouse annotation release 100, based on the new assembly mOncTor1.1 (GCF_903995425.1)
- American pika (pictured above) annotation release 102 based on new assembly OchPri4.0 (GCF_014633375.1)
- pharaoh ant annotation release 102 based on new assembly ASM1337386v2 (GCF_013373865.1)
- olive fruit fly annotation release 101, based on the assembly MU_Boleae_v2 (GCF_001188975.3)
Updated human genome Annotation Release 105.20201022 (GRCh37.p13)
Annotation Release 105.20201022 is an annotation update for the previous human reference assembly, GRCh37.p13 (hg19). This update is not a part of RefSeq FTP release but the annotation products are available in the sequence databases and on the genomes FTP site.
COVID-19 related human gene annotation now in NCBI RefSeq and Gene
The RefSeq group has compiled a set of human genes with roles in coronavirus infection and disease. You can now see and search for these genes and their regulatory elements in NCBI Gene and RefSeq.
Matched Annotation by NCBI and EMBL-EBI (MANE) version 0.92
NCBI RefSeq and Ensembl/GENCODE announced MANE v0.92, which covers 16,865 genes or ~88% of known human protein-coding genes.
NCBI Datasets now provides downloads of gene data for more than 30 thousand organisms.