Tag: genome assemblies

Announcing NCBI Datasets – try it out!

NCBI introduces Datasets, a new resource that lets you easily gather data from across NCBI databases. Our first release allows you to find and download genomic sequence and annotation data for all eukaryotic organisms through our user-friendly web interface.

Our web interface also provides an interactive taxonomy tree that lets you browse for your favorite organism. We are currently testing the web interface in the NCBI labs environment. To try it out, enter a taxonomic name or assembly accession and click on the ‘Get Data’ button in the search results panel.

Here’s what it looks like when you search ‘apes’:

Continue reading “Announcing NCBI Datasets – try it out!”

Important changes coming to prokaryotic Reference and Representative genome assemblies

We are making changes to the set of bacterial and archaeal RefSeq Reference and Representative assemblies in February 2020.

  • We will reduce the number of Reference assemblies to 15 that have annotation provided by outside experts (Table 1) and re-annotate the 105 other current Reference assemblies using the latest Prokaryotic Genome Annotation Pipeline (PGAP) software. The re-annotated assemblies will lose reference status.
  • We will reassess and revise the set of Representative assemblies so that there is one assembly per species to better reflect the taxonomic diversity of the RefSeq bacterial and archaeal assemblies.

Continue reading “Important changes coming to prokaryotic Reference and Representative genome assemblies”

Important changes to the genomes FTP site in February

We have added the latest NCBI Eukaryotic Genome Annotation Pipeline results for the more than 580 species that we annotate to the genomes/refseq directory on the genomes FTP area. As we announced in December, we will stop publishing annotation results to the genus_species directories (example: genomes/Xenopus_tropicalis) on the genomes FTP site effective February 1, 2020. We will also move existing genus_species directories to genomes/archive/old_refseq during the month of February.X_t_assemblyFigure 1. The Assembly page for the Xenopus tropicalis UCB Xtro 10.0 (GCF_000004195.4) showing the blue download button. Annotation results such as the RefSeq transcript alignments that can be downloaded from the web page are now also under the genomes/refseq directory on the FTP site. The FTP path to the .bam alignment files is in red.

These FTP changes do not affect the Assembly download function. As always, you can download assembly data using the blue Download button on the web pages (Figure 1).

 

Recent improvements to the genome Assembly resource

We’re constantly making improvements to the NCBI genome Assembly resource. This post points out some recent advances, highlighted in Figure 1 and described in more detail below.Surv_projFigure 1.  New improvements to the Assembly web pages. The results page showing the surveillance project filter (lower left), which excludes 28,220 Klebsiella pneumoniae assemblies from the Pathogen Detection Project, and the Download Assemblies button with a link to the File type description (circled in red, upper right). For other improvements in the Download Assemblies menu see our recent post.

Continue reading “Recent improvements to the genome Assembly resource”

NCBI implements new, natural language sequence search

In late May, we introduced a new type of search experience in NCBI Labs that uses natural language queries to make common tasks easier. The experience at NCBI Labs – where we experiment with potential new features and tools – proved successful. We’re pleased to announce that we added this simplified search capability to NCBI’s global search page. Some natural language queries now work in the “All Databases” search from the NCBI home page!

NCBI search bar, red arrow pointing to all databases

Continue reading “NCBI implements new, natural language sequence search”

NCBI scientists verify taxonomic identities in prokaryotic genomes

As of March 2018, there were 141,000 prokaryotic genomes in the Assembly database. As this database grows, misassigned prokaryotic genomes becomes a serious problem. Taxonomy misassignment can occur through simple submission error or can accumulate as new information adds greater specification to the taxonomic tree.

paper in the International Journal of Systematic and Evolutionary Microbiology presents the method NCBI scientists used to verify taxonomic identities in prokaryotic genomes. The authors used an Average Nucleotide Identity method with optimum threshold ranges for prokaryotic taxa to review all prokaryotic genome assemblies in GenBank. This method relies on Type strain information and is one outcome of a 2015 workshop involving several important parties in the bacteriology community.

Test drive a new sequence search experience at NCBI Labs

We know it’s not always easy to find the sequence data you’re after at NCBI. Maybe it’s because you’re no expert at constructing queries, and you end up with no results or too many results. Or maybe you’re an Entrez wizard, but creating a query full of Booleans and filters seems like overkill when you could just write a short natural language query, like you’re used to doing in Google.  The next time you search for a gene, transcript or genome assembly for a given organism, try the new search experience we’re piloting in NCBI Labs.

In NCBI Labs, you can now search for sequences using natural language and get the best results.

NCBI Labs transcript search interface
Figure 1. The new interface for specified transcript search.

The improved search experience now available in NCBI Labs addresses 3 types of queries that commonly fail in searches at NCBI: organism-gene (e.g. human BRCA1), organism-transcript (e.g. Mouse p53 transcripts) and organism-assembly (e.g. dog reference genome). For each of these query types in NCBI Labs, we now return NCBI’s highest quality sequence sets or reference and representative assemblies in an easy-to-view panel.

Example queries are shown below to get you started.

Continue reading “Test drive a new sequence search experience at NCBI Labs”

November 1 webinar: Introducing the Genome Data Viewer (GDV)

November 1 webinar: Introducing the Genome Data Viewer (GDV)

On Wednesday, November 1, 2017, we will present a webinar on GDV, NCBI’s full-featured genome browser. In this webinar, you’ll learn how to explore and analyze sequences and annotations for eukaryotic RefSeq genome assemblies. We’ll show you how to:

  • Search across the entire assembly for genes, products and other markers or jump to a specific position or range
  • Display any of seven preselected track sets highlighting various aspects of the assembly or create and load your own custom track sets from your NCBI account.
  • Load and display submitted alignment data from NCBI’s GEO or SRA.
  • Upload your own annotation and variant data
  • Display BLAST or Primer-BLAST results on the assembly in the browser.

Date and time: Wednesday, November 1, 2017 12:00-12:30PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Genome data download made easy!

Genome data download made easy!

This blog post is directed toward Assembly users.

A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.

For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.
Continue reading “Genome data download made easy!”