We are happy to announce that you can now submit your genome sequences annotated by your own local copy of the standalone Prokaryotic Genome Annotation Pipeline (PGAP) to GenBank.
How does it work? Download PGAP from GitHub, provide some basic information and the FASTA sequences for your genome sequence, and run the pipeline on your own machine, compute farm or the cloud. PGAP will produce annotation consistent with NCBI’s internal PGAP. Submit the resulting annotated genome to GenBank through the genome submission portal, and get an accession back.
As with any other submitted assembly, PGAP-annotated genomes will be screened for foreign contaminants and vector sequences at submission. Any annotated assemblies that don’t pass may need to be modified. We are developing an automated process to handle these edits!
We are also working on other improvements to stand-alone PGAP such as a module for calculating Average Nucleotide Identity (ANI) to confirm the assembly’s taxonomic assignment. Stay tuned for new developments!
If you are a consumer or producer of AGP (A Golden Path) files for genome assemblies, please read on. We’d like your feedback on the proposed changes described here.
As you know, AGP files are used to describe the structure of certain genome assemblies. The AGP file format has not kept up with changes in sequencing technology or International Sequence Database Collaboration (INSDC) feature usage. NCBI is therefore proposing to extend the current AGP v2.0 specification to add new linkage evidence types and a gap type of “contamination” as detailed below and described in the AGP v2.1 proposed specification.
Do you have Norovirus sequence data to submit to GenBank? Try out the newly-released improvements in our submission service for Norovirus data! The new service offers the following advantages:
- Faster processing and shorter time to accession numbers
- Improved user interface
- Automatic Feature annotation
Figure 1. The submission portal page showing the new option for submitting Norovirus data.
Begin a new Norovirus submission or see how to get started submitting other data to GenBank.
GenBank accepts a wide range of data to support scientific discovery and analysis on sequences from all branches of life.
The ClinVar Team is happy to announce a new online form in the ClinVar Submission Portal, the Single SCV Update, which makes it easier for you to update a single record.
The new ClinVar Single SCV Update form showing the sections for editing the evaluation date, clinical significance, condition, and citations.
Update: NCBI is now in the process of merging EST and GSS records into the Nucleotide database, and we expect to complete this process in early 2019. Accession.version and GI identifiers will not change during this process.
As of December 1, 2018, all records from the databases for Expressed Sequence Tags (EST) and Genome Survey Sequences (GSS) will reside in NCBI’s Nucleotide database. This change will provide a single point of access for all GenBank sequence data with a common look and feel.
Read more to learn about how this change affects these resources:
- Websites (Entrez)
- APIs (E-utilities)
- FTP sites
- Submission procedures
- TSA (have a look if you’re not familiar!)
NCBI now offers a flu sequence submission wizard that makes submissions easier and will provide you with accession numbers sooner. To get started, sign in to NCBI, go to the Submission Portal and choose the link for “Ribosomal RNA (rRNA), rRNA-ITS or Influenza sequences” from the GenBank section.
dbSNP’s Human Build 150 includes a large number of new submissions from the Human Longevity, Inc. (HLI) and TopMed, increasing the total number of Human RefSNPs in the database from 154 to 324 million. TopMed has also provided new allele frequency data for 163 million RefSNPs.
This article is intended for GenBank data submitters with a basic knowledge of BLAST who submit sequence data from protein-coding genes.
One of the most common problems when submitting DNA or RNA sequence data from protein-coding genes to GenBank is failing to add information about the coding region (often abbreviated as CDS) or incorrectly defining the CDS. Incomplete or incorrect CDS information will prevent you from having accession numbers assigned to your submission data set, but there is a procedure that will help you troubleshoot any problems with the CDS feature annotation: doing a BLAST analysis with your sequences before you submit your data.
Here’s how to use nucleotide BLAST (blastn) and the formatting options menu to analyze, interpret and troubleshoot your submissions:
1. To start the BLAST analysis, go to the BLAST homepage and select “nucleotide blast”.
Figure 1. Select “nucleotide blast”.
Submitting sequences to GenBank can seem complicated at first, but starting with a solid foundation in the form of a properly formatted file will make the process go smoothly.
Before submitting sequence data to GenBank, the data must be formatted correctly, the most common file format being FASTA. This post will show you how to create a FASTA file for submitting single- and multiple-nucleotide sequences.
Submitters can upload FASTA-formatted sequence files using NCBI’s stand-alone software Sequin, command line tbl2asn or our web-based submission tool BankIt.
The image below depicts a single sequence in FASTA format. For multiple sequences, such as those of population or phylogenetic studies, environmental samples, and batch sequences of the same gene, create the file using the steps below and put the set of sequences together in a single FASTA file.
Here is how to create the FASTA file: