Tag: RefSeq

RefSeq Release 209 is available

RefSeq Release 209 is available

RefSeq release 209 is now available online, from the FTP site and through NCBI’s Entrez
programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of November 1, 2021, and contains 296,293,486 records, including 215,655,378 proteins, 41,751,205 RNAs, and sequences from 114,396 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 209 is available”

New NCBI Gene Ensembl Comparison Expansion

NCBI Gene has added Ensembl Rapid Releases to the calculation of matching annotations between NCBI RefSeq and Ensembl. This has resulted in the inclusion of over 60 additional assemblies for a total of 241 organisms represented in the set. Matches are made based on transcript and CDS comparisons, and Ensembl gene, transcript, and protein identifiers for annotations similar to the NCBI RefSeq annotations are reported in NCBI Gene and in the gene2ensembl file on the Gene FTP site. The Ensembl annotation is also available in the graphical view and in NCBI’s Genome Data Viewer to give you a side-by-side view of how the annotations compare. Check out blue whale E2F1 for an example.

Figure 1. Balaenoptera musculus E2F transcription factor 1 in Genome Data Viewer

Updated prokaryotic representative genome collection

The bacterial and archaeal representative genome collection has been updated!  We selected a total of 14,912 of the 224,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has grown by 8% since April 2021 and now includes Candidatus and endosymbiont species (Figure 1), which constitute 303 and 140 respectively of the 1,077 newly added species. In addition, 719 species are represented by a better assembly, and 70 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

Figure 1. Graphical view of a portion of the RefSeq Representative assembly for the bedbug endosymbiont Candidatus Wolbachia massiliensis isolate PL13.

Continue reading “Updated prokaryotic representative genome collection”

NCBI Presentations at Biodiversity Genomics 2021 Highlight Growing Support for Comparative Genomics

NCBI Presentations at Biodiversity Genomics 2021 Highlight Growing Support for Comparative Genomics

The National Center for Biotechnology Information (NCBI) has several speakers at the upcoming Biodiversity Genomics Conference from September 27 to October 1, 2021.

Valerie Schneider, head of NCBI’s SeqPlus Program and Deputy Director for Sequence Offerings, will present a poster discussing how NCBI’s new comparative genome research focus will enable researchers to explore all eukaryotic research organisms, find related organisms and support additional organism-specific resources that a specific community may have or wish to develop.

Nuala O’Leary, Product Owner, NCBI Datasets will present the latest developments for Datasets, a beta resource that supports intuitive and flexible access to genome data for a broad range of taxa via a redesigned website and command-line tools.

Adelaide Rhodes, Cloud Subject Matter Expert in Education, will present two case studies that emphasize the ease of navigating the new Datasets website as well as the use of command line tools to speed up data discovery for genes and genomes of interest.

Terence Murphy, Product Owner, NCBI RefSeq will present a new tool for genome providers to identify contamination in newly assembled sequences with high sensitivity, specificity, and performance.

The Biodiversity Genomics Conference brings together a global audience to celebrate achievements in genome sequencing across the eukaryotic tree of life, explore current challenges and solutions, and to develop strategies for sequencing and data sharing in the upcoming decade of biodiversity genomics. NCBI has several programs that support the needs of this scientific research group.

Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI

Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI

This post is in support of the CDC’s Fungal Disease Awareness Week — September 20-24, 2021.

The impact of fungal diseases on human health has often been neglected, but increased association of fungal infections with severe illness and death during the COVID-19 pandemic has brought fungal diseases into the spotlight.

According to the CDC, the most common fungal co-infections in patients with COVID-19 include aspergillosis or invasive candidiasis including healthcare-associated infection from Candida auris.  Other reported diseases are mucormycosis, coccidioidomycosis and cryptococcosis. Aspergillosis is commonly caused by Aspergillus fumigatus, mucormycosis by Rhizopus species, coccidioidomycosis by Coccidioides immitis and C. posadasii and cryptococcosis by Cryptococcus neoformans.

This post explores several NCBI resources that have relevant information about the fungal pathogens implicated in these COVID-19 related illnesses.

Assembled genomes

Correctly identified and annotated genome assemblies are available for the fungal taxa implicated as co-infections in COVID-19 patients are summarized in table below.  These and  many other fungi are also available as curated RefSeq genome assemblies.

Continue reading “Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI”

RefSeq Release 208 is available!

RefSeq Release 208 is available!

RefSeq release 208 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of September 7, 2021, and contains 288,903,207 records, including 210,703,648 proteins, 40,213,945 RNAs, and sequences from 113,002 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 208 is available!”

Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!

Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!

The new reference assembly for sheep is now annotated! Assembly ARS-UI_Ramb_v2.0 is made of 142 scaffolds, a drop from 2,640 in the 2017 assembly Oar_rambouillet_v1.0. With a contig N50 of 43 Mb, ARS-UI_Ramb_v2.0 is 15 times more contiguous than the first assembly of the Rambouillet breed.

Annotation Release 104 (AR 104) of ARS-UI_Ramb_v2.0 reflects these improvements. Nearly 200 more coding genes have a 1:1 ortholog in the human genome than in the annotation of Oar_rambouillet_v1.0 (AR 103). The number of coding models annotated as partial is down 35% from 165 to 107, and the number of coding models labeled low quality due to suspected indels or base substitutions in the underlying genomic sequence decreased by 51% (1646 to 796). Based on BUSCO analysis, 99.1% of the models (cetartiodactyla_odb10) are complete in AR 104 versus 98.8% in AR 103. Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can be found here. Continue reading “Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!”

New RefSeq annotations for human, zebra finch, great white shark and more!

New RefSeq annotations for human, zebra finch, great white shark and more!

In May and June, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for 27 organisms.

This release includes new annotations for human, zebra finch, golden eagle, sea urchin, snowfinch, Arctic fox, clawed frog, great white shark, and more:

Continue reading “New RefSeq annotations for human, zebra finch, great white shark and more!”

RefSeq release 207 is available!

RefSeq release 207 is available!

RefSeq release 207 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of July 12, 2021, and contains 285,425,070 records, including 209,035,492 proteins, 39,039,901 RNAs, and sequences from 112,462 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq release 207 is available!”