Attention dbGaP submitters! Join us on November 3, 2021 at 12PM US eastern time to learn about data submission and processing improvements to dbGaP, NIH’s database of Genotype and Phenotype, which contains individual-level data associated with human research studies. You will see how we have made submission easier through the Submission Portal using automated preliminary validation and how you can use GaPTools, a stand-alone data validation tool, on your own submission to expedite the submission process. Join us to discover how dbGaP ensures integrity and high-quality in the genomic data that scientists can access to further their research.
We have recently added several exciting improvements to the SARS-CoV-2 GenBank submission process based on community feedback. To save you time, NCBI completes feature annotation for you, which means SARS-CoV-2 GenBank submission only requires a FASTA file and source metadata. Here are other new features to ease and simplify your submission workflow.
Automatically remove failed sequences from a submission: On the web, a single click lets you opt-in to automatic removal of failed sequences (Figure 1) so that the rest of your sequences can be swiftly accessioned! A report provided after the submission lists your failed sequences and points out potential sequence problems so that you can take a closer look after your error-free sequences are released. This option is also available for submission via FTP.
Every so often, we gather our most recent videos in one post on the blog, for your convenience. Scroll down – and don’t forget to subscribe to our channel!
Introducing GaPTools for dbGaP Submitters
This video introduces new standalone software called GaPTools, which you can use to check your data before submitting to dbGaP. GaPTools uses the same preliminary validation checks as the dbGaP submission portal.
The password you set at NCBI to log in to My NCBI, SciENcv, My Bibliography, or submit data to NCBI, will be going away. You will soon have to link a third-party login (e.g. eRA Commons, Google, Microsoft, or a university or institutional log in) to access your account. Join us on July 28, 2021 at 12PM eastern time to learn learn what you need to do link a third-party login using our Wizards, get an updated timeline for the transition third-party logins, and get answers to your questions.
Date and time: Wed, July 28, 2021 12:00 PM – 12:45 PM EDT
During the COVID-19 pandemic, it is critical to collect descriptive information about the provenance and attributes of SARS-CoV-2 genomic samples so that the course of the virus may be tracked and analyzed. The NCBI Submission Portal now includes a dedicated BioSample submission package to help further improve the quality and richness of submitted SARS-CoV-2 sample metadata. The SARS-CoV-2 clinical or host-associated package presents a framework and standardized fields for submitters to provide attributes considered useful for the rapid analysis and surveillance of SARS-CoV-2 clinical and host-associated cases. For example, mandatory attributes include collection date and geographic location, while suggested but optional attributes include date of SARS-CoV-2 vaccination, vaccine received, and host disease outcome.
Join us on March 3, 2021 to learn about changes to NCBI account log ins that will affect those of you who sign in directly your NCBI account. After June 1, 2021 you will need to log in using your institution, social media, Google, Microsoft or login.gov account username and password. In this webinar, you will learn how to register for a free login.gov account and how to link this to an existing NCBI account. You’ll also see where to find the most up-to-date information and FAQs on this topic.
We will answer a few questions from our mail bag on these changes. If you would like to submit a question in advance, please send an Email to at firstname.lastname@example.org with the subject line “Changes to my NCBI Log In” by February 24th.
Date and time: Wed, March 3, 2020 12:00 PM – 12:45 PM EST
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.
If you use Sequin to submit prokaryotic or eukaryotic genome sequences to GenBank, you need to be aware that Sequin will be retired in January 2021. Genome Workbench’s Submission Wizard, which is already available for submitting annotated genomes, will be the submission tool to use for annotated genomes going forward.
Genome Workbench is desktop software that offers a rich set of integrated tools for studying and analyzing genetic data. You can explore and compare data from multiple sources, including the NCBI databases or the your own private data. The Submission Wizard, available since 2019, allows you to prepare submissions of single genomes where all sequences come from the same organism. This interface (Figure 1) is particularly valuable for:
Eukaryotic genomes with annotations, for example those prepared with tbl2asn
Prokaryotic genomes annotated by non-NCBI tools including Prokka and RAST.
Please register to attend our webinar on November 18 to see how to use Genome Workbench to prepare a submission.
(Note: You should continue to submit organelle and viral genomes using BankIt. Please visit the Submission Portal page for information on other submission options.)
The National Library of Medicine and its partners in the International Nucleotide Database Collaboration (INSDC) have joined together to issue a statement encouraging the scientific community to submit their SARS-CoV-2 sequences to INSDC databases. The databases offer broad open access and integrated data, literature and tools – features that we believe are critical as the research community works together to understand and combat COVID-19. Read the full statement below.
The databases of the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) capture, organize, preserve and present nucleotide sequence data as part of the open scientific record. INSDC member institutions – the EMBL European Bioinformatics Institute (EMBL-EBI), the NIG DNA Data Bank of Japan (NIG-DDBJ) and the National Library of Medicine’s National Center for Biotechnology Information at NIH (NCBI) – are committed to the continued delivery of this critical element of scientific infrastructure.
The global COVID-19 crisis has brought an urgent need for the rapid open sharing of data relating to the outbreak. Most importantly, access to sequence data from the SARS-CoV-2 viral genome is essential for our understanding of the biology and spread of COVID-19. To aid in that effort, all three INSDC members have prioritized processing of SARS-CoV-2 sequence data and have streamlined the submission process.
Availability of data through INSDC databases provides:
Rapid open access – INSDC quickly makes submitted data freely available to everyone, without restrictions on reuse
Linkage of raw sequence read data to genome assemblies, providing researchers with the ability to validate the integrity of assemblies and investigate asserted mutations and changes in genome sequences
Integration of SARS-CoV-2 sequences with entirety of INSDC data, including related coronaviruses genome sequences, enabling comparison across species
Linkage of sequences to the published literature
Tools – INSDC partners provide integrated data analysis tools, such as BLAST, enhancing the discovery process
In support of the global response to the COVID-19 crisis, the INSDC calls upon the research community to:
Submit raw SARS-CoV-2 data to the databases of the INSDC
Submit consensus/assembled SARS-CoV-2 data to the databases of the INSDC
Provide information relating to the sequenced isolate or sample as part of the sequence submission; minimally the time and place of isolation/sampling and an isolate/sample identifier should be provided to maximize the value of the sequences.
In cases where scientists have already established submissions to other databases, these submissions should continue in parallel to the INSDC submission
The integration of INSDC databases with the global bioinformatics data infrastructure, including tools, secondary databases, compute capacity and curation processes, assures the rapid dissemination of data and drives its maximal impact.
In addition to these fundamental roles of INSDC member institutions in the sharing of viral sequence data, each institution has rapidly established COVID-19-specific programs and resources: the European COVID-19 Data Platform from EMBL-EBI, the DDBJ’s Research Data Resources on New Coronavirus and the NCBI SARS-CoV-2 Resources. These resources both demonstrate the connectedness of INSDC databases to broader bioinformatics initiatives and serve to add immediate value to COVID-19 research.
Guy Cochrane (EMBL-EBI), Ilene Karsch-Mizrachi (NCBI-NLM-NIH), & Masanori Arita (DDBJ) on behalf of the International Nucleotide Sequence Database Collaboration
When there is an outbreak of dengue fever in the world, it’s critical that viral genomic sequence data be submitted by researchers and made available to analyze as soon as possible. You can now submit Dengue virus sequences to GenBank using a new workflow (Figure 1) in the Submission Portal designed to help make these data available as soon as possible. The streamlined process, similar to the one described in a previous post for animal mitochondrial COX1 sequences, has an improved interface, enhanced validation, and automatic annotation that saves you time and effort.
Figure 1. The Submission Portal pages for targeted sequence submission workflows. Top panel. The new submission page for entering the workflow. Bottom panel. Submission Portal page with the Dengue virus submission option selected (boxed in red). The service has options for other targeted submissions including mitochondrial COX1 from multicellular animals (metazoa), ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.
This update is part of a larger and ongoing effort to consolidate GenBank submissions in a central location. In addition to Dengue virus data, you can also submit Influenza A, B, C and Norovirus sequences as well as other targeted sequences including mitochondrial COX1 genes from multicellular animals (metazoa), ribosomal RNA (rRNA), and rRNA-ITS through the options on the Submission Portal. You should submit other types of sequence data including other virus sequences to GenBank using BankIt or tbl2ASN.
You can use the search feature on the Submission Portal to find the appropriate submission tool for your data.
GenBank submitters, you can now submit mitochondrial COX1 (cytochrome oxidase subunit I; COI) sequence data from multicellular animals (metazoa) using a new workflow (Figure 1) with an improved interface, enhanced validation, and automatic COX1 CDS feature annotation. Once you have submitted mitochondrial COX1 data using this tool, you’ll have a single, helpful page to reference your submission information: accession number(s), COX1 submission status, relevant files and more. Plus, you can also fix any errors from this page.
Figure 1. Submission Portal page with the mitochondrial COX1 submission option selected (boxed in red). The service has options for other targeted submissions including ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.