Category: What’s New

View intron feature evidence in the Genome Data Viewer and Sequence Viewer

Are you a researcher who works on gene biology and are interested in alternative splice patterns in your gene or genes of interest?  If so, be sure to explore the intron feature evidence available in graphics views of genome assemblies annotated by NCBI. You can view the NCBI evidence used for calling splice variant for genes, add other intron feature evidence tracks, and use new display and filter options that make it easier to interpret the data .

Figure 1. Graphical view of the monoamine oxidase gene (MAOA, MOAB) region on the human X  chromosome showing intron features tracks (‘RNA-seq intron features, aggregate’ and ‘Intropolis RNA-Seq intron features’). Mousing-over an intron feature activates a tooltip that shows details such as the number of reads with the splice site, the location on the chromosome, the length of the intron and the donor and acceptor bases at the splice site. The Intropolis track was added through the search feature of the Configure Tracks menu and configured (bottom menu) so that the features were sorted by strand and filtered so that only features with greater than 500 reads appear.

Continue reading “View intron feature evidence in the Genome Data Viewer and Sequence Viewer”

October-December eukaryotic genome annotations in Refseq

October-December eukaryotic genome annotations in Refseq

Since October, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for a large number of organisms. We’ve separated them by group; click on “details” to see the full list for each.

Mammals

Pedestrians on pedestrian crosswalk. Top view.

  • Artibeus jamaicensis (Jamaican fruit-eating bat)
  • Arvicola amphibius (Eurasian water vole)
  • Balaenoptera musculus (Blue whale)
  • Cebus imitator (Panamanian white-faced capuchin)
  • Chlorocebus sabaeus (green monkey)
  • Homo sapiens (human)
  • Manis javanica (Malayan pangolin)
  • Manis pentadactyla (Chinese pangolin)
  • Ochotona princeps (American pika)
  • Peromyscus leucopus (white-footed mouse)
  • Pipistrellus kuhlii (Kuhl’s pipistrelle)
  • Sturnira hondurensis (bat)
  • Talpa occidentalis (Iberian mole)
  • Trichosurus vulpecula (common brushtail)

Continue reading “October-December eukaryotic genome annotations in Refseq”

Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements

The Pathogen Detection project now offers the Microbial Browser for Identification of Genetic and Genomic Elements (MicroBIGG-E) that lets you browse anti-microbial resistance (AMR), stress response,  virulence genes and genomic elements found in GenBank published isolate genomes from the NCBI Isolates Browser.  Unlike the  Isolates Browser that provides only a strain-level view of both published and unpublished genomes, MicroBIGG-E shows  the location of these genes, how they were identified, plus phenotypic information (Figure 1).

Figure 1. Top panel.  Portion of the  MicroBIGG-E table display showing the results of a search (genes_on_contig:blaTEM-1 AND genes_on_contig:blaKPC*) for isolates that contain two different beta lactamase genes (blaTEM-1 and any of the carbapenem-hydrolyzing , blaKPC* ) on a single contig.  Available columns include  the element’s type, subtype, and class as well as information about how the element was identified and supporting evidence.    Bottom panel. Graphical view of the annotation on a contig from one of the isolates, the assembled Serratia marcescens record NZ_CP020507 showing the two beta-lactamases in the search (blaTEM-1 and blaKPC-3) as well as an oxacillin-hydrolyzing gene (blaOXA-9). All three genes and some other AMR and stress response genes are part  a mobile element on the assembled contig. 

Continue reading “Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements”

Allele Frequency Aggregator (ALFA) Release 2 is available!

We are excited to announce the NCBI Allele Frequency Aggregator (ALFA) Release 2 (version 20201027095038) as one of the largest and most comprehensive aggregated variant datasets with allele frequency available as open-access. This release contains 79 dbGaP studies that included 192 thousand subjects and 5.8 trillion combined genotypes that generated allele frequency for 904 million variants with 316 million novel ones, previously unknown in dbSNP (Build 154).

Continue reading “Allele Frequency Aggregator (ALFA) Release 2 is available!”

RefSeq release 204 is now available

RefSeq release 204 is now available

RefSeq release 204 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of January 4, 2021, and contains 262,714,372 records, including 191,411,721 proteins, 35,353,412 RNAs, and sequences from 106,581 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20201120
Updated Annotation Release 109.20201120 is an update of NCBI Homo sapiens Annotation Release 109.

The annotation report for 109.20201120 is available here. The annotation products are available in the sequence databases and on the FTP site. Continue reading “RefSeq release 204 is now available”

Prokaryotic representative genomes updated — now over 13 thousand assemblies!

We have updated the bacterial and archaeal representative genome collection!  The current collection contains over 13,000 assemblies selected from the 203,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has increased by 11% since August 2020.  We’ve included about 1,400 species for the first time, have used better assemblies for 1,177 species, and have removed 65 species because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We have also updated the  Representative Genomes Database on the Microbial Nucleotide BLAST page as well as the RefSeq Representative Genome Database on basic nucleotide BLAST, to reflect these changes. Continue reading “Prokaryotic representative genomes updated — now over 13 thousand assemblies!”

GenBank release 241.0

GenBank release 241.0 (12/21/2020) is now available on the NCBI FTP site. This release has 12.98 trillion bases and 2.27 billion records.

The current release has 221,467,827 traditional records containing 723,003,822,007 base pairs of sequence data. There are also 1,517,995,689 WGS records containing 11,830,842,428,018 base pairs of sequence data, 446,397,378 bulk-oriented TSA records containing 392,206,975,386 base pairs of sequence data, and 88,039,152 bulk-oriented TLS records containing 33,036,509,446 base pairs of sequence data. Continue reading “GenBank release 241.0”

Important Changes to NCBI Accounts Coming in 2021

Update: Please see our FAQ page for more information and updates.

Do you login to NCBI to use MyNCBI, SciENcv, or MyBibliography? Do you submit data to NCBI? If so, you’ll want to read further to get a first glimpse at some important changes to NCBI accounts that will be coming in 2021.

What’s happening?

In brief, NCBI will be transitioning to federated account credentials. NCBI-managed credentials are the username and password you set at NCBI — these will be going away. Federated account credentials are those set through eRA Commons, Google, or a university or institutional point of access.

Why is this happening?

NIH, NLM, and NCBI take your privacy and security very seriously. As part of our normal reviews we have determined that making this change will increase the security of your accounts to a level that we feel is necessary.

When is this happening?

After June 1, 2021, you will no longer be able to use NCBI-managed credentials to login to NCBI.

Continue reading “Important Changes to NCBI Accounts Coming in 2021”

Expanding access to coronavirus-related literature: the COVID-19 Initiative in PMC reaches 100K articles!

One important way the National Library of Medicine (NLM) is responding to the ongoing public health emergency is through the COVID-19 Initiative. This public-private cooperation between NLM and more than 50 scholarly publishers and societies allows you to access over 100,000 articles on COVID-19, SARS-CoV-2 and other coronaviruses through PubMed Central (PMC). This collection includes recently published discoveries, a history of coronavirus reports for comparison, international (globally comprehensive) content, and captures the breadth of research, analysis, and commentary. We make these articles available in human- and machine-readable formats to support public accessibility and analysis by researchers.

You can search this public health emergency collection in PMC or download the collection through the PMC Open Access Subset. The collection spans:

    • More than half a century of research, including articles from the 1960s through the present (more than 60% of the articles included thus far were published in 2020 (Figure 1, top panel);
    •  Several languages, including content in English (~95%), German, French, and Spanish;
    •  Many publication types, more than half of them research or review articles (Figure 1, bottom panel).

Figure 1. The Public Health Emergency Collection articles by decade of publication (top panel) and by publication type (bottom panel).

People have viewed or downloaded articles in this PMC collection more than 80 million times since March reflecting the great demand for such an open and centralized collection. Artificial intelligence organizations, such as the Allen Institute for AI — builders of the COVID-19 Research Dataset (CORD-19), have also used the collection to develop new text and data mining techniques that can help answer high-priority scientific questions related to COVID-19.

To learn more about the initiative and NLM’s collaborators, see the Public Health Emergency COVID-19 Initiative overview and related FAQs.