Tag: Nucleotide BLAST (blastn)

Updated prokaryotic representative genomes collection includes 685 new species!

We are happy to announce an updated bacterial and archaeal representative genomes collection. The current collection contains a total of 15,507 assemblies selected from 236,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has grown by five percent since August 2021. A total of 685 species are represented for the first time. In addition, 370 species are represented by a better assembly, and 84 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative genomes database (fourth in the menu) to reflect these changes. Finally, remember that you can now run BLAST searches against the proteins annotated on representative genomes (second in the menu). Find more information here.

Updated prokaryotic representative genome collection

The bacterial and archaeal representative genome collection has been updated! We selected a total of 14,912 of the 224,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has grown by 8% since April 2021 and now includes Candidatus and endosymbiont species (Figure 1), which constitute 303 and 140 respectively of the 1,077 newly added species. In addition, 719 species are represented by a better assembly, and 70 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

Figure 1. Graphical view of a portion of the RefSeq Representative assembly for the bedbug endosymbiont Candidatus Wolbachia massiliensis isolate PL13.

Continue reading “Updated prokaryotic representative genome collection” →

BLAST+ 2.12.0 now available with more efficient multithreaded searches

BLAST+ 2.12.0 programs feature better multithreaded searches and support a different threading model, threading by query, that can be more efficient in some situations. The new release is also fully compatible the increase in the numeric range for the GI identifier, which will take effect in the nucleotide database later this year. The list below shows details of the new features and bug fixes. You can download the new BLAST release from the FTP site.

Continue reading “BLAST+ 2.12.0 now available with more efficient multithreaded searches” →

Prokaryotic representative genomes update–over 900 new species!

We are happy to announce an updated bacterial and archaeal representative genome collection! We have selected 13,835 among 214,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has increased by 6% since December 2020. About 950 species are represented for the first time, 476 species are represented by a better assembly, and 170 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

Continue reading “Prokaryotic representative genomes update–over 900 new species!” →

April 7 Webinar: Recent and upcoming enhancements to NCBI BLAST and Primer-BLAST services!

Join us on April 7, 2021 at 12PM eastern time to learn about new web BLAST and Primer-BLAST enhancements that improve your BLAST experience. You’ll also see a preview of some planned improvements to the databases that make it easier to find relevant matches.

Recent changes to web BLAST include added data columns on the descriptions table, so you can quickly find and sort your matches. Primer-BLAST now offers direct links from genome assembly pages, so you can easily select the specificity database. Primer-BLAST also now accepts multiple target templates making it easy to design primers that can amplify several similar sequences such as all splice variants of gene or the same target (16S, COI) from different strains or species.

Date and time: Wed, April 7, 2021 12:00 PM – 12:45 PM EDT
Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Updated and improved collection of RefSeq representative genome assemblies now available

We have updated the collection of representative genome assemblies for Bacteria and Archaea. As announced in April, this set is now recalculated three times a year. We selected a total of 11,727 prokaryotic assemblies to represent their respective species among the 192,000 assemblies in RefSeq. Six hundred and thirty-five species were included in the collection for the first time, while 395 organisms from undefined species (such as Bacillus bacterium) were removed. We were able to choose a higher-quality representative than in the previous set for 18% of Bacterial and Archaeal species due to improvements in the logic of the selection that is now based on the assembly length, number of pseudo CDSs called in the PGAP annotation, number of scaffolds, whether Gene IDs are available in the Gene database for the assembly that is currently representative, and type strain status. You can see the exact criteria in order of importance on the Prokaryotic RefSeq Genomes page. Now that the new selection process is in place, we expect future updates to have fewer changes. We will replace a representative only if the assembly has changed RefSeq status or if a substantially better assembly becomes available.

We have updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative Genome Database, to reflect these changes.

You can download the reference and representative set from the Assembly resource. If you are interested in the annotation on these genomes, you can limit searches to proteins annotated on representative genomes by adding “refseq_select[filter]” to any query in the Protein database. For example, you can find all proteins annotated on representative genomes in the genus Klebsiella by using the query: “Klebsiella[organism] AND refseq_select[filter]“. A BLAST database of proteins annotated on representative genomes will be coming soon. Stay tuned!

New ribosomal RNA BLAST databases available on the web BLAST service and for download

We have a curated set of ribosomal RNA (rRNA) reference sequences (Targeted Loci) with verifiable organism sources and current names. This set is critical for correctly identifying and classifying prokaryotic (bacteria and archaea) and fungal samples (Table 1). To provide easy access to these sequences, we recently added a separate rRNA/ITS databases section on the nucleotide BLAST page for these targeted sequences that makes it convenient to quickly identify source organisms (Figure 1)

Database	BioProjects	Sequences
16S ribosomal RNA (Bacteria and Archaea)	PRJNA33317 , PRJNA33175	20,845
18S ribosomal RNA sequences (SSU) from Fungi type and reference material	PRJNA39195	2,337
28S ribosomal RNA sequences (LSU) from Fungi type and reference material	PRJNA51803	5,185
Internal transcribed spacer region (ITS) from Fungi and Oomycete type and reference material	PRJNA177353, PRJNA362621	10,874

Table 1. NCBI curated targeted rRNA sequences now available as BLAST databases. Continue reading “New ribosomal RNA BLAST databases available on the web BLAST service and for download” →

The new BLAST results are now the default view

As you may know, we have been offering a new BLAST results (Figure 1) as a test page since April. In response to your positive reception and after incorporating many improvements that you suggested, we made the new results the default today, August 1, 2019.

You will still be able to access to the traditional results for a several months. This will provide you additional time if you need it to adjust your workflows or teaching materials to the new display.

Continue reading “The new BLAST results are now the default view” →

Introducing the new Virus Sequence Search Interface

BLAST is a powerful search tool, but often a search is just the beginning of the journey. We put ourselves in the shoes of a researcher who has just sequenced a handful of samples from the latest viral outbreak and tried to understand what information would be most useful. We also reached out to researchers in the field and asked: a) what questions do they really want to answer? and b) how can NCBI best provide the answers? Based on insights from those questions and answers, we developed the new Virus Sequence Search Interface (Fig. 1). The Search Interface is an NCBI Labs project, which means it is an experimental project, and we may modify the resource based on your feedback and experiences.

Figure 1. The Virus Sequence Selection Interface. — **Figure 1.** The Virus Sequence Selection Interface. The Virus Sequence Selection Interface accepts as input nucleotide and protein accessions, as well as FASTA and plain-text formatted sequences. The user selects either “Nucleotide” or “Protein,” depending on the sequence type, and selects the virus type from the pull-down menu below the text entry field.

Continue reading “Introducing the new Virus Sequence Search Interface” →

NCBI Replacing Obsolete NCBI Genomes (chromosome) and Removing Human ALU repeat elements (alu_repeats) BLAST databases

NCBI will discontinue both the NCBI Genomes (chromosome) and the Human ALU repeat elements (alu_repeats) BLAST databases in October 2017.

Better alternatives to NCBI Genomes (chromosome)

The existing NCBI Genomes (chromosome) database does not offer complete and non-redundant coverage of genome data. The newly added NCBI RefSeq Genomes Database (refseq_genomes) and the RefSeq Representative Genomes Database (refseq_representative_genomes) are more useful alternatives to the chromosome database. You can select these databases from the database pull-down list on any general BLAST form that searches a nucleotide database (blastn, tblastn).

Figure 1. The nucleotide-nucleotide BLAST database menu with the recommended (RefSeq Genome and Representative genomes) and deprecated (NCBI genomes (chromosomes) and Human ALU repeats) databases highlighted.

Continue reading “NCBI Replacing Obsolete NCBI Genomes (chromosome) and Removing Human ALU repeat elements (alu_repeats) BLAST databases” →