Tag: Basic Local Alignment Search Tool (BLAST)

Now available: Updated prokaryote representative genomes collection

Now available: Updated prokaryote representative genomes collection

An updated bacterial and archaeal representative genomes collection is available! We selected a total of 16,665 of the 262,000 prokaryotic assemblies in RefSeq to represent their respective species. For the first time, more complete assemblies (as calculated by CheckM) were ranked higher than less complete assemblies. See the ranked list of criteria for selecting representative assemblies here. Continue reading “Now available: Updated prokaryote representative genomes collection”

Connect with NCBI at ASHG 2022

Connect with NCBI at ASHG 2022

Join us October 25-29 in Los Angeles, CA

We are looking forward to seeing you in-person at the American Society of Human Genetics (ASHG) annual meeting, October 25-29, 2022, in Los Angeles, California.

We will present a variety of talks and posters featuring our clinical and human genetic resources, as well as genome products and tools. We are excited to introduce the NIH Comparative Genomics Resource (CGR), a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources to biomedical research. If you’re interested in providing feedback that will be used to help drive CGR forward, consider joining our round table discussion.  

Check out NCBI’s schedule of activities and events: 

Continue reading “Connect with NCBI at ASHG 2022”

New Upcoming NCBI Virtual Workshops!

New Upcoming NCBI Virtual Workshops!

Apply to attend October 2022 interactive, hands-on workshops

Want to learn more about NCBI resources and how to implement our cutting-edge tools in your research? NCBI offers a variety of educational opportunities, including workshops, webinars, codeathons, tutorials, and more!

We are excited to announce our upcoming virtual workshop series for October 2022. Our interactive, hands-on workshops are taught by experienced NCBI Education Faculty. Applications are open to the public; however, each workshop will accept a limited number of participants to facilitate the best possible educational experience. Continue reading “New Upcoming NCBI Virtual Workshops!”

Announcing new links and annotations on Conserved Domain Search results!

Announcing new links and annotations on Conserved Domain Search results!

Conserved Domain Search (CD Search) results now show domain architecture information and other annotations that further characterize predicted domain and protein function. These include links to PubMed, Gene Ontology (GO) terms, Enzyme Commission (EC) numbers, and the SPARCLE Domain Architecture Viewer. You can use these links on the results to find literature (PubMed), assign biological roles and protein function (GO and EC), and find proteins with the same domain architecture (Domain Architecture Viewer).  These annotations are currently available for a limited number of architectures, but we will continue to add them  as part of our curation effort.

Figure 1 shows the results of an example CD Search showing these new links.  Note that you can use the GO and EC information provided to retrieve protein models with these annotations from the Protein Family Models database, for example GO:0030246[GOTermId] — molecular function carbohydrate binding or  2.7.11.1[ECNumber]non-specific serine/threonine protein kinase.

Figure 1. Conserved Domain Database search results for a hypothetical protein (XP_007132600.1) from the common bean (Phaseolus vulgaris). The results classify the protein as a plant receptor-like protein kinase. The results also show the EC number and the GO terms associated with this domain architecture, a link to a PubMed citation for the protein family (receptor-like protein kinases), and a link to the Domain Architecture Viewer for G-type lectin S-receptor-like serine/threonine-protein kinases. The Domain Architecture Viewer shows other proteins from the NCBI databases with the same domain architecture (order, number and types of domains).  Continue reading “Announcing new links and annotations on Conserved Domain Search results!”

Try out the latest BLAST ClusteredNR database results. Now with in-cluster analyses!

Try out the latest BLAST ClusteredNR database results. Now with in-cluster analyses!

As we previously announced, we are offering a ClusteredNR protein database on the web BLAST service that provides faster searches, greater taxonomic reach, and easier to interpret results than the traditional nr database. We’ve added some new features to the results that make the ClusteredNR even more useful by allowing analyses within each cluster including the ability to:

    • Align the query to the members of the cluster.
    • Display Tree View and MSA View the cluster alignment.
    • Submit the cluster to COBALT to generate a true multiple sequence alignment of the members.
    • Display a BLAST Taxonomy Report to see the taxonomic distribution of the sources of the members.

Figure 1 shows you how access these in-cluster analysis options. The new Cluster Taxonomy report is shown in Figure 2. Try ClusteredNR yourself — follow this link to set up a search!

Continue reading “Try out the latest BLAST ClusteredNR database results. Now with in-cluster analyses!”

Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!

Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!

Join NLM’s NCBI at the virtual CollaborationFest on July 15 from 08:00 – 11:00 CDT and 12:00 – 16:00 CDT following the BOSC 2022 conference. Get an in-depth orientation and opportunity to test the capabilities of Datasets and ElasticBLAST.

What is Datasets?

Datasets is a new resource that lets you easily gather data from across NCBI databases. Find and download gene, transcript, protein and genome sequences, annotation and metadata. We invite you to try the Datasets command line tool in your bioinformatic workflows! Continue reading “Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!”

Introducing NLM’s new NCBI Datasets genome page!

Introducing NLM’s new NCBI Datasets genome page!

As part of an ongoing effort to modernize and improve your experience, NLM’s NCBI Datasets is introducing all-new genome pages. These pages make it easier for you to browse and download genome sequence and metadata, and navigate to tools such as the Genome Data Viewer (GDV) and BLAST.

To get started, search NCBI Datasets by assembly accession (e.g., GCF_016699485.2), assembly name (e.g., bGalGal1.mat.broiler.GRCg7b), WGS accession (e.g., JAENSK01), or species name + genome (e.g., chicken genome), and click on the title in the box. See the top red arrow in Figure 1 below where we search for ‘chicken genome’.

Figure 1: Finding the chicken reference assembly. A search for ‘chicken genome’ returns a box that provides a quick link to the new genome page (middle red arrow). From there, the download button (bottom red arrow) allows you to select the files you need (see ‘Download Package’ window on the left) along with a detailed metadata report that includes all the metadata on the web page.  Continue reading “Introducing NLM’s new NCBI Datasets genome page!”

Announcing an updated prokaryotic representative genomes collection with 706 new species!

Announcing an updated prokaryotic representative genomes collection with 706 new species!

An updated bacterial and archaeal representative genomes collection is available! A total of 16,105 assemblies among the 249,000 prokaryotic assemblies in RefSeq were selected to represent their respective species. The collection has grown by 3.7% since January 2022. A total of 706 species are represented for the first time. In addition, 186 species are represented by a better assembly, and 124 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative genomes database (fourth in the menu) to reflect these changes. Finally, remember that you can now run BLAST searches against the proteins annotated on representative genomes (second in the menu). See more info here.

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Come visit NCBI at the Bioinformatics Open Science Conference (BOSC), part of the Intelligent Systems for Molecular Biology Conference (ISMB), July 13-16, taking place both in person in Madison, Wisconsin and virtually! We’ll be presenting talks and posters on the latest updates to the NCBI Datasets, BLAST, and Protein resources. You can also join us at the Birds of a Feather (BoF) discussion and the BOSC CollaborationFest (CoFest) to explore these resources and discuss workflows with NCBI staff. Continue reading “Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022”

New ClusteredNR database: faster searches and more informative BLAST results

New ClusteredNR database: faster searches and more informative BLAST results

Reduced redundancy. Faster searches. More diverse proteins and organisms in your BLAST results. Check out our new ClusteredNR database – derived from the default BLAST protein nr database by clustering sequences at 90% identity / 90% length (details below).  Get quicker results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances.

Searching ClusteredNR

You can choose the ClusteredNR database in the Choose Search Set section of the BLAST submission form where you normally pick the BLAST database.  Simply select the Experimental databases radio button.  You can also select the checkbox to search both ClusteredNR and the standard nr at the same time allowing you to compare results (Figure 1).

Figure 1. The ‘Choose Search Set’ section of the BLAST submission form. Selecting the Experimental databases radio button chooses ClusteredNR. You can also perform simultaneous searches against the clustered and the standard nr by checking ‘Select to compare standard and experimental database.’ Continue reading “New ClusteredNR database: faster searches and more informative BLAST results”