The new Protein Family Model resource (Figure 1) provides a way for you to search across the evidence used by the NCBI annotation pipelines to name and classify proteins. You can find protein families by gene symbol, protein function, and many other terms. You have access to related proteins in the family and publications describing members. Protein Family Models includes protein profile hidden Markov models (HMMs) and BlastRules for prokaryotes, and conserved domain architectures for prokaryotes and eukaryotes. The HMMs in the collection include Pfam models, TIGRFAMs as well as models developed at NCBI either de novo, or from NCBI protein clusters. Each of the BlastRules (PMCID: 5753331) consists of one or more model proteins of known biological function with BLAST identity and coverage cutoffs. The conserved domain architectures are based on BLAST-compatible Position Specific Score Matrices (PSSMs) that constitute the NCBI Conserved Domain database.Figure 1. Protein Family Model resource pages. Top panel. Home page. Middle panel, selected results summaries from a fielded search for the DnaK gene product (DnaK[Gene Symbol]). Bottom panel, a portion of an HMM record for DnaK derived from NCBI Protein Clusters (NF009946). The record also includes PubMed citations and HMMER analyses showing the RefSeq proteins named by this method.
Tag: Basic Local Alignment Search Tool (BLAST)
Primer-BLAST now has a “Primers common for a group of sequences” submission tab that allows you to design primers for a group of highly similar sequences. For example, you may want test for expression of any transcript of gene rather than a specific splice variant, so you want to design primers to cover all transcript variants. Or you may want to design primers that will amplify the same gene in closely related bacteria strains. To find primers for a group of related sequences, Primer-BLAST aligns the longest sequence to the rest to find common regions. It uses these to limit the locations of primers. The longest sequence is also used as the representative template sequence in the results. Figure 1 shows an example search for primers that will amplify all of the 15 splice variants for the human TP53 gene.
Figure 1. Primer-BLAST submission page and results for primers designed for the human TP53 transcripts. Top panel: The submission form with the “Primers common for a group of sequences” selected and the 15 RefSeq transcript accessions for TP53. Middle panel: The graphical results showing the longest sequence (NM_001126114.3) as the representative template, the locations of the primer pairs, and the alignment of the other template sequences. Bottom panel: An individual primer pair showing the locations on each of the template sequences.
Please try out this new feature and let us know what you think!
To provide a more efficient BLAST experience for everyone, we’re changing some parameters and limits on the web BLAST service on September 8, 2020. The new settings, listed below, will improve overall performance and make search times more consistent.
- The Expect Value Threshold default setting will be reduced to 0.05.
- The maximum number of target sequences (Max target sequences) limit will be no more than 5,000.
- The maximum allowed query length for nucleotide queries (blastn, blastx, and tblastx) will be 1,000,000 and 100,000 for protein queries (blastp and tblastn).
These changes will help keep the BLAST service running smoothly as the already very large databases continue to grow rapidly. If you have any questions or concerns, please email us at email@example.com
You can now download a publication-quality graphic images of the alignment displayed in the NCBI Multiple Sequence Alignment Viewer (Figure 1). Load sequence alignments into the viewer from BLAST or COBALT results or upload alignment files directly. Once you have the the alignment set in the viewer, choose the “Printer-friendly PDF/SVG” option in the Download menu on the toolbar to save the image. The PDF and SVG files contain vector graphics suitable for presentation and publication. Figure 1. The image download options in the MSAV. You can adjust the desired coordinate range and choose to download a PDF or SVG image. You can also preview the PDF download . Choose simplified color shading to improve compatibility with some graphics programs.
The downloaded image will show the coordinate range you requested and will include all the rows in the alignment.
Please contact us through the Feedback link on the MSA Viewer or write to the NCBI Help Desk to provide feedback and let us know how we can make the NCBI Multiple Sequence Viewer work better for you.
We’ve released a new version (1.16.0) of IgBLAST , the popular NCBI package for classifying and analyzing immunoglobulin (IG) and T cell receptor (TCR) variable domain sequences. Version 1.16.0 has three new improvements.
- Added the ability to extend the J gene alignment at 3’ the end of the region (Figure 1). This allows you to view the unaligned bases that otherwise would not be included because of low sequence similarity.
Figure 1. The new “extend alignment at the 3′ end” option on the IgBLAST web form. The command line option is ‘-extend_align3end’. Continue reading “A new version of IgBLAST (1.16.0) is here!”
We have updated the collection of representative and reference assemblies for Bacteria and Archaea to better reflect the taxonomic breadth of the prokaryotes in RefSeq. We chose the 11,478 representative assemblies in the new collection from the 180,000+ prokaryotic assemblies in RefSeq today. We have selected one representative or reference assembly for every species based on several criteria including contiguity, completeness and whether the assembly is from type material. We have also updated the reference and representative microbial Blast database to reflect these changes. This reference and representative set will be updated three times a year to reflect changes in RefSeq. In addition, as we announced on Feb 14, we have reduced the number of reference genome assemblies — the subset of representative assemblies with annotation provided by outside experts — to 15. See the list in our previous post . We have re-annotated the 104 assemblies that are no longer reference with or Prokaryotic Genome Annotations Pipel (PGAP).
We have a curated set of ribosomal RNA (rRNA) reference sequences (Targeted Loci) with verifiable organism sources and current names. This set is critical for correctly identifying and classifying prokaryotic (bacteria and archaea) and fungal samples (Table 1). To provide easy access to these sequences, we recently added a separate rRNA/ITS databases section on the nucleotide BLAST page for these targeted sequences that makes it convenient to quickly identify source organisms (Figure 1)
|16S ribosomal RNA (Bacteria and Archaea)||PRJNA33317 , PRJNA33175
|18S ribosomal RNA sequences (SSU) from Fungi type and reference material||PRJNA39195||2,337|
|28S ribosomal RNA sequences (LSU) from Fungi type and reference material||PRJNA51803||5,185|
|Internal transcribed spacer region (ITS) from Fungi and Oomycete type and reference material||PRJNA177353, PRJNA362621
Table 1. NCBI curated targeted rRNA sequences now available as BLAST databases. Continue reading “New ribosomal RNA BLAST databases available on the web BLAST service and for download”
As we announced, the new default database version for BLAST+ is dbV5. To complete the transition to the new version, we will modify the directory structure and naming conventions on the BLAST FTP database directory. We expect to make this change around February 4th, 2020.
Here is a list of what we will change:
- All databases at the base of the blastdb directory (/ blast/db/) will be the dbV5 versions.
- The version 5 databases will no longer have “_v5” as part of the archive or database names.
- We will move the dbV4 databases to a v4 subdirectory (/blast/db/v4/).
- The now legacy dbV4 database archives will have “_v4” in their names (e.g., nr_v4.00.tar.gz); we will not rename the files within the archive.
- We will no longer update the dbV4 databases.
- We will freeze the cloud directory (/blast/db/cloud/) with no new entries after January 13, 2020.
- We will provide only nr, nt, swissprot, and pdbaa files in the FASTA directory (/blast/db/FASTA/).
Please adjust your scripts or procedures to accommodate the changes!
If you have any questions or concerns, please contact us.
IgBLAST is a popular NCBI package for classifying and analyzing immunoglobulin (IG) and T cell receptor (TCR) variable domain sequences. We’ve released a new version (1.15.0) of IgBLAST with four new improvements / bug fixes:
- Support for the new framework region 4 (FWR4) annotation feature in the standard alignment formats and AIRR format.
- Renamed the previous “-penalty” parameter to -V_penalty to be consistent with other IgBLAST penalty options.
- Restored constant internal BLAST search parameters for domain annotation (i.e., FWR/CDR) so that this process is not influenced by user-provided parameters.
- Corrected FWR/CDR annotations for certain mouse VK and rat VH germline genes.
The BLAST+ 2.10.0 release is now available from our FTP site. The new version offers the following improvements:
- updated composition-based statistics for protein-protein (including translated BLAST) comparisons to provide stable results when you request fewer than the default number of results
- an experimental Adaptive Composition Based Statistics option that increases the likelihood of finding novel results. To enable this option set the environment variable ADAPTIVE_CBS to 1. We welcome your feedback on this new option.
See the release notes for details on more improvements and bug fixes with this release.
The new version fully supports the version 5 (v5) databases with built in taxonomy and other improvements. For more information on v5 databases (download), see the previous NCBI Insights article and the recording of our webinar. If you are still using the older version 4 (v4) databases, we recommend you begin using the v5 version as soon as possible. We will discontinue updates to the older v4 databases in early 2020.