New release of the Prokaryotic Genome Annotation Pipeline with updated tRNAscan and protein models


A new version of the Prokaryotic Genome Annotation Pipeline (PGAP) is now available on GitHub. This release uses a new and improved version of tRNAscan (tRNAscan-SE:2.0.4) and includes our most up-to-date Hidden Markov Model and BlastRule collections for naming proteins.

Remember that you can submit the results of PGAP to GenBank. Or, if you are still improving the assembly and your genome doesn’t pass the pre-annotation validation, you can use the –ignore-all-errors mode to get a preliminary annotation.

See our previous post and our documentation for details on how to set up and run PGAP yourself.

Try PGAP and let us know how you like it!

NCBI Will Retire the Probe Database in April 2020


NCBI released the Probe database in 2005 as a registry of nucleic acid reagents for biomedical research. At that time array-based assays were prevalent, but have since declined with the advent of short read sequencing. As a result, NCBI will retire the web interface for the Probe database in April 2020. You can continue accessing the content of the database on the NCBI FTP site, but it will no longer be updated. As of this announcement, Probe will no longer be accepting new submissions.

If you have questions or concerns about this retirement, we’d love to hear from you. Please comment here or contact us at info@ncbi.nlm.nih.gov.

Request for proposals: Single Cell in the Cloud codeathon at NYGC in January


The New York Genome Center is hosting an NCBI  Single Cell in the cloud codeathon from January 15-17, 2020. Submissions for project proposals are due December 2nd.

Please submit your proposal and apply here.

What topics are in scope?

This codeathon will focus on single cell data, including RNA, DNA, and chromatin accessibility.  We are particularly interested in proposals for pipelines and analysis of SRA data, data interoperability, and using machine learning techniques in clustering.  We also welcome proposals for tutorial pipelines and educational tools. You will have access to computational resources in the Cloud to turn your idea into a working prototype.   Visit our website for examples of previous codeathon projects.

Continue reading

RefSeq Release 97 is public


RefSeq release 97 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available, as of November 4, 2019 and contains 219,407,891 records, including 157,639,958 proteins, 28,730,283 RNAs, and sequences from 97,407 organisms.

The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Continue reading

August-October 2019 RefSeq annotations: mouse, firefly and more


mouse with dark brown fur, eating

Recently, the NCBI Eukaryotic Genome Annotation Pipeline has released new annotations in RefSeq for the following organisms:

  • Aedes albopictus (Asian tiger mosquito)
  • Aquila chrysaetos chrysaetos (golden eagle)
  • Archocentrus centrarchus (flier cichlid)
  • Calypte anna (Anna’s hummingbird)
  • Camarhynchus parvulus (bird)
  • Camelus dromedarius (Arabian camel)
  • Cannabis sativa (hemp)
  • Chanos chanos (milkfish)

Continue reading

CCDS Release 23 for Mouse Now in Entrez Gene


Are you interested in high quality genomic annotations for human and mouse? Check out the Consensus Coding Sequence (CCDS) project! Release 23 of the CCDS project is now available in Entrez Gene. This release compares NCBI’s Mus musculus annotation release 108 to Ensembl’s annotation release 98. This update adds 1,570 new CCDS records and 175 genes to the mouse CCDS dataset. In total, release 23 includes 27,219 CCDS records that correspond to 20,486 genes.

Continue reading

November 13 NCBI Minute: Resources for next-gen sequence analysis


On Wednesday, November 13, 2019 at 12 PM, NCBI staff will present a webinar on NCBI resources for next-gen sequence analysis.  You will learn about key  resources that support multiple aspects of next-gen sequence analyses, including quality control, alignment, data visualization and interpreting results. You will also see how to access and apply these resources for both SRA and your own RNASeq/DNASeq datasets. Whether you’re embarking on your first analysis or already have a background in bioinformatics, you’ll find tools that meet your needs!

  • Date and time: Wed, Nov 13, 2019 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.