Improved annotation of Streptomyces RefSeq genomes


We’ve completed the RefSeq reannotation of over 1,000 Streptomyces genomes! The genomes were reannotated using the Prokaryotic Genome Annotation Pipeline (PGAP). PGAP detected nearly 100% of ribosomally synthesized and post-translationally modified peptide natural products (RiPP)-encoding genes from known families, despite their small size, using a set of over 30 hidden Markov Models (HMMs) built by RefSeq biocurators. Over 70% (251) of lasso peptides now present in Streptomyces RefSeq genomes (354) were annotated for the first time.

If you are aware of any class of RiPP precursor in Streptomyces that was not found in our recent re-annotation, please contact us through the NCBI Help Desk, and we will add new HMMs to the rules we use to find and annotate RiPP precursor genes.

June 20 NCBI Minute: Getting the Genomic Context for BLAST Protein Matches


Do you ever want to see the flanking genes of a protein match from a BLAST search?  On June 20th, we’ll show you how to see the genomic context of bacterial proteins using the identical protein report and the graphical sequence viewer. You will also learn to use these reports in detail and how to get these genomic contexts in batch for a set of protein matches using the identical proteins report and EDirect .

Date and time: Wed, June 20, 2018 12:00 PM – 12:30 PM EDT

Click to register.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

5 NCBI articles in 2018 Nucleic Acids Research database issue


The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.

Continue reading

RefSeq release 82 now public


RefSeq release 82 is accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available as of May 8, 2017 and contains 127,098,289 records, including 84,756,971 proteins, 18,901,573 RNAs, and sequences from 69,035 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Continue reading