Announcing RefSeq Release 206!

RefSeq Release 206 is now available. This release includes the following:

Updated human genome Annotation Release 109.20210514
Updated Annotation Release 109.20210514 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report is available here. The annotation products are available in the sequence databases and on the FTP site.

Other new eukaryotic genome annotations
This release includes new annotations generated by NCBI's eukaryotic genome annotation pipeline for 45 additional species, including:

New NCBI Datasets home and documentation pages provide easier access

NCBI Datasets, the new set of services for downloading genome assembly and annotation data (previous Datasets posts), has redesigned and reorganized web pages to make it easier to find and access the services and documentation you need.

NCBI Datasets has a fresh new homepage (Figure 1) highlighting the types of data available through our tools. Available data include genome assemblies, genes, and SARS-CoV-2 genomic and protein data.  You can easily access these from the new page or learn more with our new documentation pages.

Figure 1. Features of the new Datasets homepage with quick access to help documentation including the Quickstart and How-to guides as well as access to Genome, Gene, and Coronavirus Data, and the Datasets and Dataformat command-line tools.

RefSeq Release 205 is available!

RefSeq release 205 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 1, 2021, and contains 269,975,565 records, including 197,232,209 proteins, 36,514,168 RNAs, and sequences from 108,257  organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.



New release of the Read Assembly and Annotation Pipeline Tool (RAPT), now 2X faster!

There is a new release of the Read assembly and Annotation Pipeline Tool (RAPT) available from our GitHub site. RAPT is a one-step application for the genome assembly and gene annotation of archaeal and bacterial isolates that can run on your local computer or the Google Cloud Platform (GCP). With this new release, jobs will run twice as fast as with the December release. For example, we have assembled and annotated a Salmonella enterica genome in under an hour on a 16-CPU machine with the new release.
We have also added several new features based on your feedback including:

  1. The –stop-on-errors flag that will stop the process if there evidence from the average nucleotide identity check that there is sample mix-up or contamination by other bacteria.
  2. The ability to accept forward and reverse reads of paired-end runs in separate files. These can be compressed (gzip) files.

Finally, thanks to all who came to our webinar in December and provided their comments! For these who couldn’t join us, you can now view the recording on our YouTube channel.

Contact us at prokaryote-tools@ncbi.nlm.nih.gov with any question and to let us know if you would like to become a beta-tester for RAPT.

Announcing the RefSeq annotation of rat mRatBN7.2!

NCBI RefSeq has finished its initial annotation of the new rat reference assembly, mRatBN7.2, recently released by the Darwin Tree of Life Project at the Wellcome Sanger Institute. This is the first coordinate-changing update to the rat reference since the 2014 release of Rnor_6.0 from the Rat Genome Sequencing Consortium and brings the rat assembly into the modern age with a nearly 300x increase in contig N50 and 9x increase in scaffold N50 lengths. It’s a major improvement!



View intron feature evidence in the Genome Data Viewer and Sequence Viewer

Are you a researcher who works on gene biology and are interested in alternative splice patterns in your gene or genes of interest?  If so, be sure to explore the intron feature evidence available in graphics views of genome assemblies annotated by NCBI. You can view the NCBI evidence used for calling splice variant for genes, add other intron feature evidence tracks, and use new display and filter options that make it easier to interpret the data .

Figure 1. Graphical view of the monoamine oxidase gene (MAOA, MOAB) region on the human X  chromosome showing intron features tracks (‘RNA-seq intron features, aggregate’ and ‘Intropolis RNA-Seq intron features’). Mousing-over an intron feature activates a tooltip that shows details such as the number of reads with the splice site, the location on the chromosome, the length of the intron and the donor and acceptor bases at the splice site. The Intropolis track was added through the search feature of the Configure Tracks menu and configured (bottom menu) so that the features were sorted by strand and filtered so that only features with greater than 500 reads appear.



December 2 Webinar: Using the new Read assembly and Annotation Pipeline Tool (RAPT) to assemble and annotate microbial genomes

Join us December 2 to learn how to use the Read assembly and Annotation Pipeline Tool (RAPT). With RAPT, you can assemble and annotate a microbial genome right out of the sequencing machine! Provide the short genomic reads or an SRA run on input, and get back the sequence annotated with a complete gene set. The assembly is built with SKESA and annotated with PGAP. In addition, RAPT also verifies the taxonomic assignment of the genome with the Average Nucleotide Identity tool. In this webinar, you will learn how you can run RAPT on your own machine or on the Google Cloud Platform.

  • Date and time: Wed, December 2, 2020 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Human GRCh37 (hg19) RefSeq annotation update 

The NCBI RefSeq group has been in overdrive, making improvements to our human genome annotation and reference transcript and protein sets, with 8,000 new and 15,000 updated transcripts in the last year alone! That’s about 30% of our curated transcript dataset (the transcripts with NM_ and NR_ accessions), with a big focus on transcripts that are well-expressed, have conserved exons, or are transcribed from new promoters.

With all these improvements, we’ve been updating the RefSeq annotation of GRCh38.p13 every quarter. But what about GRCh37 (hg19), which many of you still use?



Recent enhancements in Genome Workbench version 3.4.1

New Features

Version 3.4.1 of Genome Workbench, NCBI’s sequence annotation and analysis platform, includes new features for the Multiple Sequence Alignment View, the Graphical Sequence View and the Sequence Editing and Submission Package as well as a number of other improvements and bug fixes.

In the Multiple Sequence Alignment View, you can now export publication quality graphics (Save As PDF/SVG  … , Figure 1). In the Graphical Sequence View you can now  search by locus tag, use improved search capabilities for genes by locus and can better display the selected location in the feature editing dialog when annotating a sequence.

MSAFigure 1. A multiple alignment view in Genome Workbench highlighting the new ability to save presentation quality image files (Save As PDF and SVG formats).

In the Sequence Editing and Submission Package, we rearranged the controls in the Table Reader dialog to fit onto smaller screens and improved importing feature tables that contain mat-peptides (mature peptide) features.

Bug Fixes and Improvements

We have made a number of other fixes and improvements.  For MacOS users we fixed blurry text in some dialogs, fixed the copy to clipboard problem, and improved support for the latest Catalina version.  We also fixed a crashing problem in the Active Object Inspector interface. You should also see improvements in loading SNP data and better recovery in cases of power outages or other events causing local file corruption.

In the Sequence Editing and Submission Package, we fixed a bug that occurred when applying miscellaneous descriptors and structured comment fields using the Table Reader and an issue with using a PubMed ID to look up a publication.

Please see the extensive help documentation including FAQs, videos, and tutorials linked to the Genome Workbench homepage for more information and examples on how to use Genome Workbench in your research.