Tag: RefSeq

Announcing RefSeq Release 206!

Announcing RefSeq Release 206!

RefSeq Release 206 is now available. This release includes the following:

Updated human genome Annotation Release 109.20210514
Updated Annotation Release 109.20210514 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report is available here. The annotation products are available in the sequence databases and on the FTP site.

Other new eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 45 additional species, including: Continue reading “Announcing RefSeq Release 206!”

NCBI at CSHL Biology of Genomes, May 11 – 14, 2021

NCBI at CSHL Biology of Genomes, May 11 – 14, 2021

NCBI staff will be presenting virtual posters at the Cold Spring Harbor Laboratory  Biology of Genomes Meeting, May 11 -14, 2021. The posters will cover the following topics: 1) a cloud-ready suite of tools (PGAP, RAPT , and SKESA) for assembling and annotating prokaryotic genomes,  2) Datasets — a new set of services for downloading genome assemblies and annotations, and 3) updates on NCBI RefSeq eukaryotic genome annotation, and the Genome Data Viewer (GDV). Read more below for the full abstracts.

The virtual poster gallery opens Tuesday, May 11 at 9:00 a.m. with dedicated time for poster viewing and discussion at 1:00 to 2:00 p.m. through Slack each day. The poster gallery will be open for entire the conference and remain available for six weeks afterwards.  Continue reading “NCBI at CSHL Biology of Genomes, May 11 – 14, 2021”

January-February 2021 RefSeq annotations include dog, fly, rat

Figure 1. This is Tasha, the female boxer used for one of the assemblies annotated for dog (GCF_000002285.5). Image courtesy of the National Human Genome Research Institute.

This January and February, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Benincasa hispida (wax gourd)
  • Canis lupus familiaris (dog)
  • Corvus cornix cornix (hooded crow)
  • Crotalus tigris (tiger rattlesnake)
  • Culex pipiens pallens (northern house mosquito)
  • Dioscorea cayenensis subsp. rotundata (Guinea yam)
  • Drosophila santomea (fly)
  • Drosophila simulans (fly)
  • Drosophila yakuba (fly)
  • Eucalyptus grandis (rose gum)
  • Hibiscus syriacus (Rose-of-Sharon)
  • Hyaena hyaena (striped hyena)
  • Maniola hyperantus (ringlet)
  • Mauremys reevesii (Reeves’s turtle)
  • Nilaparvata lugens (brown planthopper)

Continue reading “January-February 2021 RefSeq annotations include dog, fly, rat”

RefSeq Release 205 is available!

RefSeq Release 205 is available!

RefSeq release 205 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 1, 2021, and contains 269,975,565 records, including 197,232,209 proteins, 36,514,168 RNAs, and sequences from 108,257  organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Continue reading “RefSeq Release 205 is available!”

The Datasets command-line tool now provides ortholog data

You can now get gene ortholog data using the NCBI Datasets command-line tool using a gene ID, gene symbol, or RefSeq nucleotide or protein accession. Data are available for vertebrates and insects. The vertebrate orthologs includes a specialized set for fish.  (See our recent post for more information on the orthologs for fish and insects.)

You can retrieve metadata for gene orthologs in JSON Format, or you can download a compressed (zip) archive containing both metadata and sequences (Figure 1).

Figure 1. Command-lines  that use a gene symbol (BRCA1) to retrieve mammalian ortholog metadata (top, JSON metadata shown in part in the image) and sequences (bottom). 

Continue reading “The Datasets command-line tool now provides ortholog data”

Announcing the RefSeq annotation of rat mRatBN7.2!

Announcing the RefSeq annotation of rat mRatBN7.2!

NCBI RefSeq has finished its initial annotation of the new rat reference assembly, mRatBN7.2, recently released by the Darwin Tree of Life Project at the Wellcome Sanger Institute. This is the first coordinate-changing update to the rat reference since the 2014 release of Rnor_6.0 from the Rat Genome Sequencing Consortium and brings the rat assembly into the modern age with a nearly 300x increase in contig N50 and 9x increase in scaffold N50 lengths. It’s a major improvement!

Continue reading “Announcing the RefSeq annotation of rat mRatBN7.2!”

October-December eukaryotic genome annotations in Refseq

October-December eukaryotic genome annotations in Refseq

Since October, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for a large number of organisms. We’ve separated them by group; click on “details” to see the full list for each.

Mammals

Pedestrians on pedestrian crosswalk. Top view.

  • Artibeus jamaicensis (Jamaican fruit-eating bat)
  • Arvicola amphibius (Eurasian water vole)
  • Balaenoptera musculus (Blue whale)
  • Cebus imitator (Panamanian white-faced capuchin)
  • Chlorocebus sabaeus (green monkey)
  • Homo sapiens (human)
  • Manis javanica (Malayan pangolin)
  • Manis pentadactyla (Chinese pangolin)
  • Ochotona princeps (American pika)
  • Peromyscus leucopus (white-footed mouse)
  • Pipistrellus kuhlii (Kuhl’s pipistrelle)
  • Sturnira hondurensis (bat)
  • Talpa occidentalis (Iberian mole)
  • Trichosurus vulpecula (common brushtail)

Continue reading “October-December eukaryotic genome annotations in Refseq”

RefSeq release 204 is now available

RefSeq release 204 is now available

RefSeq release 204 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of January 4, 2021, and contains 262,714,372 records, including 191,411,721 proteins, 35,353,412 RNAs, and sequences from 106,581 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20201120
Updated Annotation Release 109.20201120 is an update of NCBI Homo sapiens Annotation Release 109.

The annotation report for 109.20201120 is available here. The annotation products are available in the sequence databases and on the FTP site. Continue reading “RefSeq release 204 is now available”

NCBI hidden Markov models (HMM) release 4.0 now available!

Release 4.0 of the NCBI hidden Markov models (HMM) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available from our FTP site. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

This release contains 17,443 models, including 94 new models since the last release. We have also updated names and added EC numbers and  gene symbols to over 100 models. You can search and view the details of these HMMs in the newly deployed Protein Family Model collection that also includes conserved domain architectures and BlastRules  and allows you to find all RefSeq proteins named by these profiles. See our recent post for more details.