Want to submit high-quality data quickly and easily to GenBank? Check out our Foreign Contamination Screen (FCS) tool, a quality assurance process that you can run yourself. FCS offers enhanced contaminant detection sensitivity to improve your genome assemblies and facilitate high-quality data submissions to GenBank. We recently made several improvements to make the tool even easier to use!
Now quicker and easier to run!
Decontaminate your genome with just one extra step.
Save the removed sequences in a separate file, if desired.
Find more contaminants with improved coverage of prokaryotes, protists, and more.
Effective June 2023, the HomoloGene records will redirect to the Datasets Gene Table
Do you use HomoloGene to view and download data? You can now access updated homology data from NCBI Datasets through the Datasets Gene Table with connections to NCBI Orthologs. Go directly from a HomoloGene record to the Datasets Gene Table that will give you access to up-to-date sequence data and metadata. NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
The Datasets Gene Table provides connections to the NCBI Ortholog interface (Figure 1) that provides the following data:
Orthology data based on an updated algorithm that identifies orthologs spanning > 500 vertebrate species
Legacy pages will be redirected effective June 2023
In June 2023, NCBI’s Assembly and Genome record pages will be redirected to new Datasets pages as part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.
We will update the following pages:
The NCBI Assembly pages will be redirected to the new DatasetsGenome pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST.
The NCBIGenome pages will be redirected to the DatasetsTaxonomy pages that provide a taxonomy-focused portal to genes, genomes and additional NCBI resources.
During this transition, you will have the option to return to the legacy Genome and Assembly pages.
In response to your feedback, we’ve made more whole genome cross-species alignments available in NCBI’s Comparative Genome Viewer (CGV). You can use these alignments to explore genome rearrangements between species. You can also zoom in to analyze regions of conserved gene synteny.
There are over 20 new cross-species alignments available, including human-mouse, mouse-rat, human-chimp, human-cattle, dog-cat, and others! These cross-species alignments provide additional opportunities to explore evolutionary relationships at the genomic and gene levels. We will add more cross-species alignments in the coming months.
The latest cross-species alignments added to CGV include imports from the UCSC Genomics Institute, as well as those generated at NCBI.
Check out two examples of cross-species whole-genome alignments in CGV below (Figure 1).
Figure 1. Whole genome alignments between (A) mouse and human (GRCm39 vs. GRCh38.p14) and (B) cat and dog (F.catus_Fca126_mat1.0 vs. ROS_Cfam_1.0). Colored bands connects aligned regions; green indicates same orientation, blue indicates opposite orientation.
When you zoom in on an alignment (Figure 2), you can compare gene annotation on the two assemblies and see the extent of conservation of synteny. You can also see which genes are missing from one or the other assembly, indicating changes in sequence or differences in annotation.
Historically, RefSeq EGAP has used an integer to identify a particular annotation release, such as Homo sapiens Annotation Release 110. This method provides no information on the assembly used for the annotation. In the new RefSeq naming system, annotation releases are designated by a combination of the assembly identifier (e.g., GCF_000001405.40) and an annotation name (e.g., RS_2022_04). The annotation name consists of an RS prefix to indicate RefSeq annotation, and the year and month that it was generated, RS_YYYY_MM. You should always use the annotation name in combination with the corresponding assembly accession.version, for example, GCF_026419915.1-RS_2022_12 (as shown in Figure 1). This ensures that you’re always using the name that defines a specific annotation for a specific genome assembly. If you use only part of the name, it will be ambiguous.
Do you currently add an organism name(s)to focus your searches when using the BLAST standard nr database? You can now focus your searches by organism with the BLAST ClusteredNR database and get faster results with a better overview of protein homologs in a wider range of organisms. Your searches will be restricted to protein clusters that contain one or more sequences from the organism(s) you add.
RefSeq release 216 is now available online, from the FTP site, and through NCBI’s new resource, Datasets.
This full release incorporates genomic, transcript, and protein data available as of January 9, 2023, and contains 342,395,932 records, including 249,868,639 proteins, 49,869,497 RNAs, and sequences from 128,299 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 216” →