GenBank submitters, you can now submit mitochondrial COX1 (cytochrome oxidase subunit I; COI) sequence data from multicellular animals (metazoa) using a new workflow (Figure 1) with an improved interface, enhanced validation, and automatic COX1 CDS feature annotation. Once you have submitted mitochondrial COX1 data using this tool, you’ll have a single, helpful page to reference your submission information: accession number(s), COX1 submission status, relevant files and more. Plus, you can also fix any errors from this page.
Figure 1. Submission Portal page with the mitochondrial COX1 submission option selected (boxed in red). The service has options for other targeted submissions including ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.
Do you need a quick way to annotate features on a similar set of sequences for your GenBank submission? You can now submit sequences from the same region or gene in an alignment format in BankIt and use the new ‘Feature propagation option’ (Figure 1) to apply features from a single sequence to other aligned sequences. You simply annotate one sequence and then copy that annotation across all the sequences in your submission.
Here’s how you can propagate features in three easy steps:
Validation issues can delay the processing of your submissions to GenBank. To avoid one type of delay, use the new “expected genome size” API to check the length of your genome assembly before submission.
The API compares the size of submitted genome assemblies to the expected genome size range for the species to identify outliers that can result from errors such as:
incorrect organism assignment
metagenome submitted as an organism genome
targeted sub-genome assembly not flagged as partial genome representation
gross contamination with other sequences
You can check in advance for these possible problems using the API. The API accepts the taxid for the species (taxid = Taxonomy ID – see our Taxonomy quick start guide on how to find the taxid for a given species) and the length of your assembly (excluding gaps and runs of Ns) as input and returns XML with the expected length, the acceptable range, and a status that tells you whether your assembly is too large, too small, or within the acceptable range. Look for <length_status>within_range</length_status> which confirms that your sequence passes the test!
We have released a new version of the Prokaryotic Genome Annotation Pipeline (PGAP), available on GitHub. The new release includes the ability to ignore pre-annotation validation errors (–ignore-all-errors). This new feature allows you to produce a preliminary annotation for a draft version of the genome, even one that contains vector and adapter sequences or that is outside of the size range for the species. This draft annotation should be helpful with your ongoing work on the genome assembly. Please keep in mind that these pre-annotations and assemblies with contaminants or other errors are not suitable for submission to GenBank.
Another new feature allows you to provide the name of the consortium that generated the assembly and annotation so that this information appears in the final GenBank records. For more details, consult our guidelines on input files.
See our previous post and our documentation for details on how to obtain and run PGAP yourself.
Next on our to-do list is a module for calculating Average Nucleotide Identity (ANI) to confirm the assembly’s taxonomic assignment. Stay tuned!
Genome Workbench version 3.0 (release notes) is now available. An important new feature is the submission preparation wizard that allows you to prepare prokaryotic and eukaryotic genome sequences for submission to GenBank. This wizard is the first step toward offering a better alternative to the Sequin submission tool.
You simply load your sequences into Genome Workbench and use the submission wizard to enter information about your submission through a set of dialog boxes and then save a submission-ready data file. The package also includes tools for editing your sequences, annotation, and metadata.
See the tutorial video on our YouTube channel or the Genome Workbench documentation for more details on how to enable the wizard and prepare a submission.
Have you ever needed to correct or improve SRA metadata after submitting, change the release date for your data or share your data with reviewers? Now you can perform these tasks yourself using the SRA data management features now LIVE in Submission Portal!
If you have an SRA submission and associated BioProject and BioSample, you can log into the Submission Portal, go to the Manage data tab, click into that BioProject and easily perform the following common tasks (Figure 1).
How does it work? Download PGAP from GitHub, provide some basic information and the FASTA sequences for your genome sequence, and run the pipeline on your own machine, compute farm or the cloud. PGAP will produce annotation consistent with NCBI’s internal PGAP. Submit the resulting annotated genome to GenBank through the genome submission portal, and get an accession back.
As with any other submitted assembly, PGAP-annotated genomes will be screened for foreign contaminants and vector sequences at submission. Any annotated assemblies that don’t pass may need to be modified. We are developing an automated process to handle these edits!
We are also working on other improvements to stand-alone PGAP such as a module for calculating Average Nucleotide Identity (ANI) to confirm the assembly’s taxonomic assignment. Stay tuned for new developments!
If you are a consumer or producer of AGP (A Golden Path) files for genome assemblies, please read on. We’d like your feedback on the proposed changes described here.
As you know, AGP files are used to describe the structure of certain genome assemblies. The AGP file format has not kept up with changes in sequencing technology or International Sequence Database Collaboration (INSDC) feature usage. NCBI is therefore proposing to extend the current AGP v2.0 specification to add new linkage evidence types and a gap type of “contamination” as detailed below and described in the AGP v2.1 proposed specification.