Do you work with data from organisms outside the traditional set of model organisms? Join us on March 10, 2021 to learn how to use NCBI resources including NCBI’s Taxonomy and BLAST that can help you find information from your organism and closely related taxa. You will see an example that shows you how to retrieve and download gene sequences for a set of species, generate multiple sequence alignments, and design primers using Primer-Blast.
Date and time: Wed, March 10, 2021 12:00 PM – 12:45 PM EST
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
The new Protein Family Model resource (Figure 1) provides a way for you to search across the evidence used by the NCBI annotation pipelines to name and classify proteins. You can find protein families by gene symbol, protein function, and many other terms. You have access to related proteins in the family and publications describing members. Protein Family Models includes protein profile hidden Markov models (HMMs) and BlastRules for prokaryotes, and conserved domain architectures for prokaryotes and eukaryotes. The HMMs in the collection include Pfam models, TIGRFAMs as well as models developed at NCBI either de novo, or from NCBI protein clusters. Each of the BlastRules (PMCID: 5753331) consists of one or more model proteins of known biological function with BLAST identity and coverage cutoffs. The conserved domain architectures are based on BLAST-compatible Position Specific Score Matrices (PSSMs) that constitute the NCBI Conserved Domain database.Figure 1. Protein Family Model resource pages. Top panel. Home page. Middle panel, selected results summaries from a fielded search for the DnaK gene product (DnaK[Gene Symbol]). Bottom panel, a portion of an HMM record for DnaK derived from NCBI Protein Clusters (NF009946). The record also includes PubMed citations and HMMER analyses showing the RefSeq proteins named by this method.
Primer-BLAST now has a “Primers common for a group of sequences” submission tab that allows you to design primers for a group of highly similar sequences. For example, you may want test for expression of any transcript of gene rather than a specific splice variant, so you want to design primers to cover all transcript variants. Or you may want to design primers that will amplify the same gene in closely related bacteria strains. To find primers for a group of related sequences, Primer-BLAST aligns the longest sequence to the rest to find common regions. It uses these to limit the locations of primers. The longest sequence is also used as the representative template sequence in the results. Figure 1 shows an example search for primers that will amplify all of the 15 splice variants for the human TP53 gene.
Figure 1. Primer-BLAST submission page and results for primers designed for the human TP53 transcripts. Top panel: The submission form with the “Primers common for a group of sequences” selected and the 15 RefSeq transcript accessions for TP53. Middle panel: The graphical results showing the longest sequence (NM_001126114.3) as the representative template, the locations of the primer pairs, and the alignment of the other template sequences. Bottom panel: An individual primer pair showing the locations on each of the template sequences.
Please try out this new feature and let us know what you think!
To provide a more efficient BLAST experience for everyone, we’re changing some parameters and limits on the web BLAST service on September 8, 2020. The new settings, listed below, will improve overall performance and make search times more consistent.
The Expect Value Threshold default setting will be reduced to 0.05.
The maximum number of target sequences (Max target sequences) limit will be no more than 5,000.
The maximum allowed query length for nucleotide queries (blastn, blastx, and tblastx) will be 1,000,000 and 100,000 for protein queries (blastp and tblastn).
These changes will help keep the BLAST service running smoothly as the already very large databases continue to grow rapidly. If you have any questions or concerns, please email us at email@example.com
You can now download a publication-quality graphic images of the alignment displayed in the NCBI Multiple Sequence Alignment Viewer (Figure 1). Load sequence alignments into the viewer from BLAST or COBALT results or upload alignment files directly. Once you have the the alignment set in the viewer, choose the “Printer-friendly PDF/SVG” option in the Download menu on the toolbar to save the image. The PDF and SVG files contain vector graphics suitable for presentation and publication. Figure 1. The image download options in the MSAV. You can adjust the desired coordinate range and choose to download a PDF or SVG image. You can also preview the PDF download . Choose simplified color shading to improve compatibility with some graphics programs.
The downloaded image will show the coordinate range you requested and will include all the rows in the alignment.
Please contact us through the Feedback link on the MSA Viewer or write to the NCBI Help Desk to provide feedback and let us know how we can make the NCBI Multiple Sequence Viewer work better for you.
We’ve released a new version (1.16.0) of IgBLAST , the popular NCBI package for classifying and analyzing immunoglobulin (IG) and T cell receptor (TCR) variable domain sequences. Version 1.16.0 has three new improvements.
Added the ability to extend the J gene alignment at 3’ the end of the region (Figure 1). This allows you to view the unaligned bases that otherwise would not be included because of low sequence similarity.
We have updated the collection of representative and reference assemblies for Bacteria and Archaea to better reflect the taxonomic breadth of the prokaryotes in RefSeq. We chose the 11,478 representative assemblies in the new collection from the 180,000+ prokaryotic assemblies in RefSeq today. We have selected one representative or reference assembly for every species based on several criteria including contiguity, completeness and whether the assembly is from type material. We have also updated the reference and representative microbial Blast database to reflect these changes. This reference and representative set will be updated three times a year to reflect changes in RefSeq. In addition, as we announced on Feb 14, we have reduced the number of reference genome assemblies — the subset of representative assemblies with annotation provided by outside experts — to 15. See the list in our previous post . We have re-annotated the 104 assemblies that are no longer reference with or Prokaryotic Genome Annotations Pipel (PGAP).
We have a curated set of ribosomal RNA (rRNA) reference sequences (Targeted Loci) with verifiable organism sources and current names. This set is critical for correctly identifying and classifying prokaryotic (bacteria and archaea) and fungal samples (Table 1). To provide easy access to these sequences, we recently added a separate rRNA/ITS databases section on the nucleotide BLAST page for these targeted sequences that makes it convenient to quickly identify source organisms (Figure 1)
As we announced, the new default database version for BLAST+ is dbV5. To complete the transition to the new version, we will modify the directory structure and naming conventions on the BLAST FTP database directory. We expect to make this change around February 4th, 2020.
Here is a list of what we will change:
All databases at the base of the blastdb directory (/ blast/db/) will be the dbV5 versions.
The version 5 databases will no longer have “_v5” as part of the archive or database names.
We will move the dbV4 databases to a v4 subdirectory (/blast/db/v4/).
The now legacy dbV4 database archives will have “_v4” in their names (e.g., nr_v4.00.tar.gz); we will not rename the files within the archive.
We will no longer update the dbV4 databases.
We will freeze the cloud directory (/blast/db/cloud/) with no new entries after January 13, 2020.
We will provide only nr, nt, swissprot, and pdbaa files in the FASTA directory (/blast/db/FASTA/).
Please adjust your scripts or procedures to accommodate the changes!
If you have any questions or concerns, please contact us.
IgBLAST is a popular NCBI package for classifying and analyzing immunoglobulin (IG) and T cell receptor (TCR) variable domain sequences. We’ve released a new version (1.15.0) of IgBLAST with four new improvements / bug fixes:
Support for the new framework region 4 (FWR4) annotation feature in the standard alignment formats and AIRR format.
Renamed the previous “-penalty” parameter to -V_penalty to be consistent with other IgBLAST penalty options.
Restored constant internal BLAST search parameters for domain annotation (i.e., FWR/CDR) so that this process is not influenced by user-provided parameters.
Corrected FWR/CDR annotations for certain mouse VK and rat VH germline genes.
IgBLAST 1.15 is available for download from the BLAST FTP area. See the manual on GitHub for information about setting up and running IgBLAST.