NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program

NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program

National Library of Medicine’s (NLM) National Center for Biotechnology Information (NCBI) and Amazon Web Services (AWS) are happy to announce that the controlled- and public-access Sequence Read Archive (SRA)–one of the world’s largest repositories of raw next generation sequencing data–will be freely accessible from Amazon S3 via the Open Data Sponsorship Program (ODP) as of January 2021. The SRA is currently hosted by NLM at the National Institutes of Health (NIH).

Continue reading “NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program”

The Datasets command-line tool now provides ortholog data

Important Note: Please see our latest documentation on how to download gene ortholog data. The commands below have been deprecated in the latest version of the NCBI Datasets command-line tools.

You can now get gene ortholog data using the NCBI Datasets command-line tool using a gene ID, gene symbol, or RefSeq nucleotide or protein accession. Data are available for vertebrates and insects. The vertebrate orthologs includes a specialized set for fish.  (See our recent post for more information on the orthologs for fish and insects.)

You can retrieve metadata for gene orthologs in JSON Format, or you can download a compressed (zip) archive containing both metadata and sequences (Figure 1).

Figure 1. Command-lines  that use a gene symbol (BRCA1) to retrieve mammalian ortholog metadata (top, JSON metadata shown in part in the image) and sequences (bottom). 

Continue reading “The Datasets command-line tool now provides ortholog data”

Improvements to NCBI Assembly

NCBI’s genome Assembly has a number of significant improvements!

Assembly records now have a link to Primer-BLAST making it easy to design primers in the context of a specific eukaryote genome assembly.  Figure 1 shows the Assembly page for the Genome Reference Consortium Mouse Build 39 (GRCm39) with the link to Primer-BLAST.

Figure 1. The Assembly page for the mouse reference genome (GCF_000001635.27). Showing the new Run Primer-BLAST link, which loads the assembly as a database in the Primer-BLAST search (bottom) and the new expandable note sections, Genome-Annotation-Data in this case. 
Continue reading “Improvements to NCBI Assembly”

New release of the Read Assembly and Annotation Pipeline Tool (RAPT), now 2X faster!

There is a new release of the Read assembly and Annotation Pipeline Tool (RAPT) available from our GitHub site. RAPT is a one-step application for the genome assembly and gene annotation of archaeal and bacterial isolates that can run on your local computer or the Google Cloud Platform (GCP). With this new release, jobs will run twice as fast as with the December release. For example, we have assembled and annotated a Salmonella enterica genome in under an hour on a 16-CPU machine with the new release.
We have also added several new features based on your feedback including:

  1. The –stop-on-errors flag that will stop the process if there evidence from the average nucleotide identity check that there is sample mix-up or contamination by other bacteria.
  2. The ability to accept forward and reverse reads of paired-end runs in separate files. These can be compressed (gzip) files.

Finally, thanks to all who came to our webinar in December and provided their comments! For these who couldn’t join us, you can now view the recording on our YouTube channel.

Contact us at prokaryote-tools@ncbi.nlm.nih.gov with any question and to let us know if you would like to become a beta-tester for RAPT.

Announcing the RefSeq annotation of rat mRatBN7.2!

Announcing the RefSeq annotation of rat mRatBN7.2!

NCBI RefSeq has finished its initial annotation of the new rat reference assembly, mRatBN7.2, recently released by the Darwin Tree of Life Project at the Wellcome Sanger Institute. This is the first coordinate-changing update to the rat reference since the 2014 release of Rnor_6.0 from the Rat Genome Sequencing Consortium and brings the rat assembly into the modern age with a nearly 300x increase in contig N50 and 9x increase in scaffold N50 lengths. It’s a major improvement!

Continue reading “Announcing the RefSeq annotation of rat mRatBN7.2!”

March 3 Webinar: Changes are coming to the way you log in to your NCBI account

March 3 Webinar: Changes are coming to the way you log in to your NCBI account

Join us on March 3, 2021 to learn about changes to NCBI account log ins that will affect those of you who sign in directly your NCBI account.  After June 1, 2021 you will need to log in using your institution, social media, Google, Microsoft or login.gov account username and password. In this webinar, you will learn how to register for a free login.gov account and how to link this to an existing NCBI account. You’ll also see where to find the most up-to-date information and FAQs on this topic.

We will answer a few questions from our mail bag on these changes. If you would like to submit a question in advance, please send an Email to  at info@ncbi.nlm.nih.gov with the subject line “Changes to my NCBI Log In” by February 24th.

    • Date and time: Wed, March 3, 2020 12:00 PM – 12:45 PM EST
    • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

View intron feature evidence in the Genome Data Viewer and Sequence Viewer

Are you a researcher who works on gene biology and are interested in alternative splice patterns in your gene or genes of interest?  If so, be sure to explore the intron feature evidence available in graphics views of genome assemblies annotated by NCBI. You can view the NCBI evidence used for calling splice variant for genes, add other intron feature evidence tracks, and use new display and filter options that make it easier to interpret the data .

Figure 1. Graphical view of the monoamine oxidase gene (MAOA, MOAB) region on the human X  chromosome showing intron features tracks (‘RNA-seq intron features, aggregate’ and ‘Intropolis RNA-Seq intron features’). Mousing-over an intron feature activates a tooltip that shows details such as the number of reads with the splice site, the location on the chromosome, the length of the intron and the donor and acceptor bases at the splice site. The Intropolis track was added through the search feature of the Configure Tracks menu and configured (bottom menu) so that the features were sorted by strand and filtered so that only features with greater than 500 reads appear.

Continue reading “View intron feature evidence in the Genome Data Viewer and Sequence Viewer”

October-December eukaryotic genome annotations in Refseq

October-December eukaryotic genome annotations in Refseq

Since October, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for a large number of organisms. We’ve separated them by group; click on “details” to see the full list for each.

Mammals

Pedestrians on pedestrian crosswalk. Top view.

  • Artibeus jamaicensis (Jamaican fruit-eating bat)
  • Arvicola amphibius (Eurasian water vole)
  • Balaenoptera musculus (Blue whale)
  • Cebus imitator (Panamanian white-faced capuchin)
  • Chlorocebus sabaeus (green monkey)
  • Homo sapiens (human)
  • Manis javanica (Malayan pangolin)
  • Manis pentadactyla (Chinese pangolin)
  • Ochotona princeps (American pika)
  • Peromyscus leucopus (white-footed mouse)
  • Pipistrellus kuhlii (Kuhl’s pipistrelle)
  • Sturnira hondurensis (bat)
  • Talpa occidentalis (Iberian mole)
  • Trichosurus vulpecula (common brushtail)

Continue reading “October-December eukaryotic genome annotations in Refseq”

Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements

The Pathogen Detection project now offers the Microbial Browser for Identification of Genetic and Genomic Elements (MicroBIGG-E) that lets you browse anti-microbial resistance (AMR), stress response,  virulence genes and genomic elements found in GenBank published isolate genomes from the NCBI Isolates Browser.  Unlike the  Isolates Browser that provides only a strain-level view of both published and unpublished genomes, MicroBIGG-E shows  the location of these genes, how they were identified, plus phenotypic information (Figure 1).

Figure 1. Top panel.  Portion of the  MicroBIGG-E table display showing the results of a search (genes_on_contig:blaTEM-1 AND genes_on_contig:blaKPC*) for isolates that contain two different beta lactamase genes (blaTEM-1 and any of the carbapenem-hydrolyzing , blaKPC* ) on a single contig.  Available columns include  the element’s type, subtype, and class as well as information about how the element was identified and supporting evidence.    Bottom panel. Graphical view of the annotation on a contig from one of the isolates, the assembled Serratia marcescens record NZ_CP020507 showing the two beta-lactamases in the search (blaTEM-1 and blaKPC-3) as well as an oxacillin-hydrolyzing gene (blaOXA-9). All three genes and some other AMR and stress response genes are part  a mobile element on the assembled contig. 

Continue reading “Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements”

Allele Frequency Aggregator (ALFA) Release 2 is available!

We are excited to announce the NCBI Allele Frequency Aggregator (ALFA) Release 2 (version 20201027095038) as one of the largest and most comprehensive aggregated variant datasets with allele frequency available as open-access. This release contains 79 dbGaP studies that included 192 thousand subjects and 5.8 trillion combined genotypes that generated allele frequency for 904 million variants with 316 million novel ones, previously unknown in dbSNP (Build 154).

Continue reading “Allele Frequency Aggregator (ALFA) Release 2 is available!”