Troubleshooting GenBank Submissions: Annotating the Coding Region (CDS)

This article is intended for GenBank data submitters with a basic knowledge of BLAST who submit sequence data from protein-coding genes.

One of the most common problems when submitting DNA or RNA sequence data from protein-coding genes to GenBank is failing to add information about the coding region (often abbreviated as CDS) or incorrectly defining the CDS. Incomplete or incorrect CDS information will prevent you from having accession numbers assigned to your submission data set, but there is a procedure that will help you troubleshoot any problems with the CDS feature annotation: doing a BLAST analysis with your sequences before you submit your data.

Here’s how to use nucleotide BLAST (blastn) and the formatting options menu to analyze, interpret and troubleshoot your submissions:

1. To start the BLAST analysis, go to the BLAST homepage and select “nucleotide blast”.

nucleotide blast link. click to start BLAST analysis

Figure 1. Select “nucleotide blast”.

Continue reading

How To Format Sequence Data For GenBank Submissions

Submitting sequences to GenBank can seem complicated at first, but starting with a solid foundation in the form of a properly formatted file will make the process go smoothly.

Before submitting sequence data to GenBank, the data must be formatted correctly, the most common file format being FASTA. This post will show you how to create a FASTA file for submitting single- and multiple-nucleotide sequences.

Submitters can upload FASTA-formatted sequence files using NCBI’s stand-alone software Sequin, command line tbl2asn or our web-based submission tool BankIt.

The image below depicts a single sequence in FASTA format. For multiple sequences, such as those of population or phylogenetic studies, environmental samples, and batch sequences of the same gene, create the file using the steps below and put the set of sequences together in a single FASTA file.


Here is how to create the FASTA file:

Continue reading