The password you set at NCBI to log in to My NCBI, SciENcv, My Bibliography, or submit data to NCBI, will be going away. You will soon have to link a third-party login (e.g. eRA Commons, Google, Microsoft, or a university or institutional log in) to access your account. Join us on July 28, 2021 at 12PM eastern time to learn learn what you need to do link a third-party login using our Wizards, get an updated timeline for the transition third-party logins, and get answers to your questions.
Date and time: Wed, July 28, 2021 12:00 PM – 12:45 PM EDT
During the COVID-19 pandemic, it is critical to collect descriptive information about the provenance and attributes of SARS-CoV-2 genomic samples so that the course of the virus may be tracked and analyzed. The NCBI Submission Portal now includes a dedicated BioSample submission package to help further improve the quality and richness of submitted SARS-CoV-2 sample metadata. The SARS-CoV-2 clinical or host-associated package presents a framework and standardized fields for submitters to provide attributes considered useful for the rapid analysis and surveillance of SARS-CoV-2 clinical and host-associated cases. For example, mandatory attributes include collection date and geographic location, while suggested but optional attributes include date of SARS-CoV-2 vaccination, vaccine received, and host disease outcome.
Join us on March 3, 2021 to learn about changes to NCBI account log ins that will affect those of you who sign in directly your NCBI account. After June 1, 2021 you will need to log in using your institution, social media, Google, Microsoft or login.gov account username and password. In this webinar, you will learn how to register for a free login.gov account and how to link this to an existing NCBI account. You’ll also see where to find the most up-to-date information and FAQs on this topic.
We will answer a few questions from our mail bag on these changes. If you would like to submit a question in advance, please send an Email to at firstname.lastname@example.org with the subject line “Changes to my NCBI Log In” by February 24th.
Date and time: Wed, March 3, 2020 12:00 PM – 12:45 PM EST
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.
If you use Sequin to submit prokaryotic or eukaryotic genome sequences to GenBank, you need to be aware that Sequin will be retired in January 2021. Genome Workbench’s Submission Wizard, which is already available for submitting annotated genomes, will be the submission tool to use for annotated genomes going forward.
Genome Workbench is desktop software that offers a rich set of integrated tools for studying and analyzing genetic data. You can explore and compare data from multiple sources, including the NCBI databases or the your own private data. The Submission Wizard, available since 2019, allows you to prepare submissions of single genomes where all sequences come from the same organism. This interface (Figure 1) is particularly valuable for:
Eukaryotic genomes with annotations, for example those prepared with tbl2asn
Prokaryotic genomes annotated by non-NCBI tools including Prokka and RAST.
Please register to attend our webinar on November 18 to see how to use Genome Workbench to prepare a submission.
(Note: You should continue to submit organelle and viral genomes using BankIt. Please visit the Submission Portal page for information on other submission options.)
The National Library of Medicine and its partners in the International Nucleotide Database Collaboration (INSDC) have joined together to issue a statement encouraging the scientific community to submit their SARS-CoV-2 sequences to INSDC databases. The databases offer broad open access and integrated data, literature and tools – features that we believe are critical as the research community works together to understand and combat COVID-19. Read the full statement below.
The databases of the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) capture, organize, preserve and present nucleotide sequence data as part of the open scientific record. INSDC member institutions – the EMBL European Bioinformatics Institute (EMBL-EBI), the NIG DNA Data Bank of Japan (NIG-DDBJ) and the National Library of Medicine’s National Center for Biotechnology Information at NIH (NCBI) – are committed to the continued delivery of this critical element of scientific infrastructure.
The global COVID-19 crisis has brought an urgent need for the rapid open sharing of data relating to the outbreak. Most importantly, access to sequence data from the SARS-CoV-2 viral genome is essential for our understanding of the biology and spread of COVID-19. To aid in that effort, all three INSDC members have prioritized processing of SARS-CoV-2 sequence data and have streamlined the submission process.
Availability of data through INSDC databases provides:
Rapid open access – INSDC quickly makes submitted data freely available to everyone, without restrictions on reuse
Linkage of raw sequence read data to genome assemblies, providing researchers with the ability to validate the integrity of assemblies and investigate asserted mutations and changes in genome sequences
Integration of SARS-CoV-2 sequences with entirety of INSDC data, including related coronaviruses genome sequences, enabling comparison across species
Linkage of sequences to the published literature
Tools – INSDC partners provide integrated data analysis tools, such as BLAST, enhancing the discovery process
In support of the global response to the COVID-19 crisis, the INSDC calls upon the research community to:
Submit raw SARS-CoV-2 data to the databases of the INSDC
Submit consensus/assembled SARS-CoV-2 data to the databases of the INSDC
Provide information relating to the sequenced isolate or sample as part of the sequence submission; minimally the time and place of isolation/sampling and an isolate/sample identifier should be provided to maximize the value of the sequences.
In cases where scientists have already established submissions to other databases, these submissions should continue in parallel to the INSDC submission
The integration of INSDC databases with the global bioinformatics data infrastructure, including tools, secondary databases, compute capacity and curation processes, assures the rapid dissemination of data and drives its maximal impact.
In addition to these fundamental roles of INSDC member institutions in the sharing of viral sequence data, each institution has rapidly established COVID-19-specific programs and resources: the European COVID-19 Data Platform from EMBL-EBI, the DDBJ’s Research Data Resources on New Coronavirus and the NCBI SARS-CoV-2 Resources. These resources both demonstrate the connectedness of INSDC databases to broader bioinformatics initiatives and serve to add immediate value to COVID-19 research.
Guy Cochrane (EMBL-EBI), Ilene Karsch-Mizrachi (NCBI-NLM-NIH), & Masanori Arita (DDBJ) on behalf of the International Nucleotide Sequence Database Collaboration
When there is an outbreak of dengue fever in the world, it’s critical that viral genomic sequence data be submitted by researchers and made available to analyze as soon as possible. You can now submit Dengue virus sequences to GenBank using a new workflow (Figure 1) in the Submission Portal designed to help make these data available as soon as possible. The streamlined process, similar to the one described in a previous post for animal mitochondrial COX1 sequences, has an improved interface, enhanced validation, and automatic annotation that saves you time and effort.
Figure 1. The Submission Portal pages for targeted sequence submission workflows. Top panel. The new submission page for entering the workflow. Bottom panel. Submission Portal page with the Dengue virus submission option selected (boxed in red). The service has options for other targeted submissions including mitochondrial COX1 from multicellular animals (metazoa), ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.
This update is part of a larger and ongoing effort to consolidate GenBank submissions in a central location. In addition to Dengue virus data, you can also submit Influenza A, B, C and Norovirus sequences as well as other targeted sequences including mitochondrial COX1 genes from multicellular animals (metazoa), ribosomal RNA (rRNA), and rRNA-ITS through the options on the Submission Portal. You should submit other types of sequence data including other virus sequences to GenBank using BankIt or tbl2ASN.
You can use the search feature on the Submission Portal to find the appropriate submission tool for your data.
GenBank submitters, you can now submit mitochondrial COX1 (cytochrome oxidase subunit I; COI) sequence data from multicellular animals (metazoa) using a new workflow (Figure 1) with an improved interface, enhanced validation, and automatic COX1 CDS feature annotation. Once you have submitted mitochondrial COX1 data using this tool, you’ll have a single, helpful page to reference your submission information: accession number(s), COX1 submission status, relevant files and more. Plus, you can also fix any errors from this page.
Figure 1. Submission Portal page with the mitochondrial COX1 submission option selected (boxed in red). The service has options for other targeted submissions including ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.
Do you need a quick way to annotate features on a similar set of sequences for your GenBank submission? You can now submit sequences from the same region or gene in an alignment format in BankIt and use the new ‘Feature propagation option’ (Figure 1) to apply features from a single sequence to other aligned sequences. You simply annotate one sequence and then copy that annotation across all the sequences in your submission.
Here’s how you can propagate features in three easy steps:
Validation issues can delay the processing of your submissions to GenBank. To avoid one type of delay, use the new “expected genome size” API to check the length of your genome assembly before submission.
The API compares the size of submitted genome assemblies to the expected genome size range for the species to identify outliers that can result from errors such as:
incorrect organism assignment
metagenome submitted as an organism genome
targeted sub-genome assembly not flagged as partial genome representation
gross contamination with other sequences
You can check in advance for these possible problems using the API. The API accepts the taxid for the species (taxid = Taxonomy ID – see our Taxonomy quick start guide on how to find the taxid for a given species) and the length of your assembly (excluding gaps and runs of Ns) as input and returns XML with the expected length, the acceptable range, and a status that tells you whether your assembly is too large, too small, or within the acceptable range. Look for <length_status>within_range</length_status> which confirms that your sequence passes the test!
We have released a new version of the Prokaryotic Genome Annotation Pipeline (PGAP), available on GitHub. The new release includes the ability to ignore pre-annotation validation errors (–ignore-all-errors). This new feature allows you to produce a preliminary annotation for a draft version of the genome, even one that contains vector and adapter sequences or that is outside of the size range for the species. This draft annotation should be helpful with your ongoing work on the genome assembly. Please keep in mind that these pre-annotations and assemblies with contaminants or other errors are not suitable for submission to GenBank.
Another new feature allows you to provide the name of the consortium that generated the assembly and annotation so that this information appears in the final GenBank records. For more details, consult our guidelines on input files.
See our previous post and our documentation for details on how to obtain and run PGAP yourself.
Next on our to-do list is a module for calculating Average Nucleotide Identity (ANI) to confirm the assembly’s taxonomic assignment. Stay tuned!