You can now download new file types for species recently annotated by the NCBI Eukaryotic Genome Annotation Pipeline from the Assembly web pages and from the genomes/refseq FTP area. The new files types include alignments of annotated transcripts to the assembly in BAM format, all models predicted by Gnomon, and — for species that have been annotated multiple times — files characterizing the feature-by-feature differences between the current and the previous annotation.
Reflecting the National Library of Medicine’s (NLM) ongoing commitment to public access support at the National Institutes of Health (NIH) and beyond, we are pleased to announce that a new NIHMS system will be released in early 2020. This new system aims to streamline the submission process, ensure the continued quality of manuscripts made publicly accessible, and give authors and investigators more transparent options for avoiding processing delays.
Those familiar with the current NIHMS system will find the basic steps of submitting, reviewing, and approving manuscripts for inclusion in PMC unchanged in the new system. They will see an updated user interface that simplifies the login process for returning users; provides contextual help throughout; and offers user-friendly options for importing article metadata, requesting corrections, and taking over the Reviewer role for stalled submissions. Details of these updates and more are available in this video:
On Wednesday, December 11, 2019 at 12 PM, NCBI staff will present a webinar that will show you how to use NCBI’s PGAP (https://github.com/ncbi/pgap) on your own data to predict genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. You can run PGAP your own machine, a compute farm, or in the Cloud. Plus, you can now submit genome sequences annotated by your copy of PGAP to GenBank. Attend the webinar to learn more!
- Date and time: Wed, Dec 11, 2019 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
On Wednesday, December 4, 2019 at 12 PM, NCBI staff will present a webinar on the population variation datasets at NCBI such as 1000 Genomes, ExAC, GnomAD, and TopMed that are currently included on dbSNP records. You will learn how to find the data, and how you can used this information to interpret and prioritize variants for further study. You will also see a preview a new initiative, the dbGaP Allele Frequency Aggregator (ALFA), that is based on more than 150,000 subjects in 60 dbGaP studies.
- Date and time: Wed, Dec 4, 2019 12:00 PM – 12:45 PM EDT
A new version of the Prokaryotic Genome Annotation Pipeline (PGAP) is now available on GitHub. This release uses a new and improved version of tRNAscan (tRNAscan-SE:2.0.4) and includes our most up-to-date Hidden Markov Model and BlastRule collections for naming proteins.
Remember that you can submit the results of PGAP to GenBank. Or, if you are still improving the assembly and your genome doesn’t pass the pre-annotation validation, you can use the –ignore-all-errors mode to get a preliminary annotation.
Try PGAP and let us know how you like it!
We will help run a scopeathon (January 16-17, 2020). This event focuses on planning and designing software to extract value from organismal and genera-level graph genomes by dynamically labeling with metadata. We’re seeking people who are interested in describing community level genomes as graphs, or solving problems involving complex phenotypic interactions with specific genomes. If this describes you, please apply! We also encourage people who will be in San Diego for the International Plant & Animal Genome XXVIII conference to apply. The event is open to anyone selected and willing to travel to San Diego. We will work with data from the following organism groups:
- Microbes (Bacteria and Archaea)
- Plants (corn/wheat, others)
NCBI released the Probe database in 2005 as a registry of nucleic acid reagents for biomedical research. At that time array-based assays were prevalent, but have since declined with the advent of short read sequencing. As a result, NCBI will retire the web interface for the Probe database in April 2020. You can continue accessing the content of the database on the NCBI FTP site, but it will no longer be updated. As of this announcement, Probe will no longer be accepting new submissions.
If you have questions or concerns about this retirement, we’d love to hear from you. Please comment here or contact us at email@example.com.
The New York Genome Center is hosting an NCBI Single Cell in the cloud codeathon from January 15-17, 2020. Submissions for project proposals are due December 2nd.
Please submit your proposal and apply here.
What topics are in scope?
This codeathon will focus on single cell data, including RNA, DNA, and chromatin accessibility. We are particularly interested in proposals for pipelines and analysis of SRA data, data interoperability, and using machine learning techniques in clustering. We also welcome proposals for tutorial pipelines and educational tools. You will have access to computational resources in the Cloud to turn your idea into a working prototype. Visit our website for examples of previous codeathon projects.
This full release incorporates genomic, transcript, and protein data available, as of November 4, 2019 and contains 219,407,891 records, including 157,639,958 proteins, 28,730,283 RNAs, and sequences from 97,407 organisms.
The release is provided in several directories as a complete dataset and also as divided by logical groupings.
- Aedes albopictus (Asian tiger mosquito)
- Aquila chrysaetos chrysaetos (golden eagle)
- Archocentrus centrarchus (flier cichlid)
- Calypte anna (Anna’s hummingbird)
- Camarhynchus parvulus (bird)
- Camelus dromedarius (Arabian camel)
- Cannabis sativa (hemp)
- Chanos chanos (milkfish)