Month: January 2021

October-December eukaryotic genome annotations in Refseq

Since October, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for a large number of organisms. We’ve separated them by group; click on “details” to see the full list for each.

Mammals

Artibeus jamaicensis (Jamaican fruit-eating bat)
Arvicola amphibius (Eurasian water vole)
Balaenoptera musculus (Blue whale)
Cebus imitator (Panamanian white-faced capuchin)
Chlorocebus sabaeus (green monkey)
Homo sapiens (human)
Manis javanica (Malayan pangolin)
Manis pentadactyla (Chinese pangolin)
Ochotona princeps (American pika)
Peromyscus leucopus (white-footed mouse)
Pipistrellus kuhlii (Kuhl’s pipistrelle)
Sturnira hondurensis (bat)
Talpa occidentalis (Iberian mole)
Trichosurus vulpecula (common brushtail)

Continue reading “October-December eukaryotic genome annotations in Refseq” →

Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements

The Pathogen Detection project now offers the Microbial Browser for Identification of Genetic and Genomic Elements (MicroBIGG-E) that lets you browse anti-microbial resistance (AMR), stress response, virulence genes and genomic elements found in GenBank published isolate genomes from the NCBI Isolates Browser. Unlike the Isolates Browser that provides only a strain-level view of both published and unpublished genomes, MicroBIGG-E shows the location of these genes, how they were identified, plus phenotypic information (Figure 1).

Figure 1. Top panel. Portion of the MicroBIGG-E table display showing the results of a search (genes_on_contig:blaTEM-1 AND genes_on_contig:blaKPC*) for isolates that contain two different beta lactamase genes (blaTEM-1 and any of the carbapenem-hydrolyzing , blaKPC* ) on a single contig. Available columns include the element’s type, subtype, and class as well as information about how the element was identified and supporting evidence. Bottom panel. Graphical view of the annotation on a contig from one of the isolates, the assembled Serratia marcescens record NZ_CP020507 showing the two beta-lactamases in the search (blaTEM-1 and blaKPC-3) as well as an oxacillin-hydrolyzing gene (blaOXA-9). All three genes and some other AMR and stress response genes are part a mobile element on the assembled contig.

Continue reading “Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements” →

Allele Frequency Aggregator (ALFA) Release 2 is available!

We are excited to announce the NCBI Allele Frequency Aggregator (ALFA) Release 2 (version 20201027095038) as one of the largest and most comprehensive aggregated variant datasets with allele frequency available as open-access. This release contains 79 dbGaP studies that included 192 thousand subjects and 5.8 trillion combined genotypes that generated allele frequency for 904 million variants with 316 million novel ones, previously unknown in dbSNP (Build 154).

Continue reading “Allele Frequency Aggregator (ALFA) Release 2 is available!” →

NCBI on YouTube: RAPT and BLAST+ on the Cloud, SARS-CoV-2 genome data in Datasets

It’s time we do another roundup of what’s been happening on YouTube!

First up, the NCBI YouTube channel has merged with the NLM YouTube channel. You’ll now be able to find diverse content all on one channel, from tips on using resources to fascinating moments in the history of medicine and more!

Continue reading “NCBI on YouTube: RAPT and BLAST+ on the Cloud, SARS-CoV-2 genome data in Datasets” →

RefSeq release 204 is now available

RefSeq release 204 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of January 4, 2021, and contains 262,714,372 records, including 191,411,721 proteins, 35,353,412 RNAs, and sequences from 106,581 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20201120
Updated Annotation Release 109.20201120 is an update of NCBI Homo sapiens Annotation Release 109.

The annotation report for 109.20201120 is available here. The annotation products are available in the sequence databases and on the FTP site. Continue reading “RefSeq release 204 is now available” →

Prokaryotic representative genomes updated — now over 13 thousand assemblies!

We have updated the bacterial and archaeal representative genome collection! The current collection contains over 13,000 assemblies selected from the 203,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has increased by 11% since August 2020. We’ve included about 1,400 species for the first time, have used better assemblies for 1,177 species, and have removed 65 species because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We have also updated the Representative Genomes Database on the Microbial Nucleotide BLAST page as well as the RefSeq Representative Genome Database on basic nucleotide BLAST, to reflect these changes. Continue reading “Prokaryotic representative genomes updated — now over 13 thousand assemblies!” →

GenBank release 241.0

GenBank release 241.0 (12/21/2020) is now available on the NCBI FTP site. This release has 12.98 trillion bases and 2.27 billion records.

The current release has 221,467,827 traditional records containing 723,003,822,007 base pairs of sequence data. There are also 1,517,995,689 WGS records containing 11,830,842,428,018 base pairs of sequence data, 446,397,378 bulk-oriented TSA records containing 392,206,975,386 base pairs of sequence data, and 88,039,152 bulk-oriented TLS records containing 33,036,509,446 base pairs of sequence data. Continue reading “GenBank release 241.0” →

Retrieve genome data by BioProject using the Datasets command-line tool

You can now retrieve genome data using the NCBI Datasets command-line tool and API by simply providing a BioProject accession. You can go directly from a BioProject accession to genome data even when the BioProject accession is the parent of multiple BioProjects (Figure 1).

Figure 1. Command-lines using BioProject accessions with the datasets command-line tool and sample metadata. Top panel: command-line for downloading genome metadata for the Sanger 25 Genomes Project (PRJEB33226). Middle panel: a portion of the metadata in JSON format for the 25 Genomes Project. Bottom panel: command-line for downloading sequence data and annotation metadata for a component BioProject for the king scallop (PRJEB35331). Continue reading “Retrieve genome data by BioProject using the Datasets command-line tool” →

Important Changes to NCBI Accounts Coming in 2021

Update: Please see our FAQ page for more information and updates.

Do you login to NCBI to use MyNCBI, SciENcv, or MyBibliography? Do you submit data to NCBI? If so, you’ll want to read further to get a first glimpse at some important changes to NCBI accounts that will be coming in 2021.

What’s happening?

In brief, NCBI will be transitioning to federated account credentials. NCBI-managed credentials are the username and password you set at NCBI — these will be going away. Federated account credentials are those set through eRA Commons, Google, or a university or institutional point of access.

Why is this happening?

NIH, NLM, and NCBI take your privacy and security very seriously. As part of our normal reviews we have determined that making this change will increase the security of your accounts to a level that we feel is necessary.

When is this happening?

After June 1, 2021, you will no longer be able to use NCBI-managed credentials to login to NCBI.

Continue reading “Important Changes to NCBI Accounts Coming in 2021” →