Tag: GenBank

New Improvements! Try out our Foreign Contamination Screen (FCS) Tool

New Improvements! Try out our Foreign Contamination Screen (FCS) Tool

Want to submit high-quality data quickly and easily to GenBank? Check out our Foreign Contamination Screen (FCS) tool, a quality assurance process that you can run yourself. FCS offers enhanced contaminant detection sensitivity to improve your genome assemblies and facilitate high-quality data submissions to GenBank. We recently made several improvements to make the tool even easier to use! 

What’s New?
  • Now quicker and easier to run!  
  • Decontaminate your genome with just one extra step. 
    • Save the removed sequences in a separate file, if desired.  
  • More accurate!  
  • Find more contaminants with improved coverage of prokaryotes, protists, and more. 
  • Screen your genome on the cloud in minutes. 

Continue reading “New Improvements! Try out our Foreign Contamination Screen (FCS) Tool”

Read About NCBI Resources in 2023 Nucleic Acids Research Database Issue

Read About NCBI Resources in 2023 Nucleic Acids Research Database Issue

The 2023 Nucleic Acids Research Database Issue features papers from NCBI staff on GenBank, Conserved Domain Database, and more. The citations are available in PubMed with full-text available in PubMed Central (PMC). To read an article, click on the PMCID number listed below.  Continue reading “Read About NCBI Resources in 2023 Nucleic Acids Research Database Issue”

3+ Ways NCBI is Enhancing the SRA Database

3+ Ways NCBI is Enhancing the SRA Database

Do you submit or access Sequence Read Archive (SRA) data? In an ongoing effort to enhance your experience, NCBI is making several improvements to our widely used SRA database. SRA is the largest publicly available repository of high throughput sequencing data. The archive accepts data from all organisms as well as metagenomic and environmental surveys. SRA stores raw sequencing data and alignment information to enable reproducibility and facilitate new discoveries through data analysis. 

What improvements is NCBI making?

  • More transparent: We recently launched the GenBank and SRA Data processing page to help you better understand how sequence data are submitted, processed, and made publicly available. 
  • More efficient: Faster data transfers, downloads, and analyses! We will be incrementally streamlining how you access SRA data as SRA Lite becomes the standard SRA file format. This simplified format reduces the average file size for more efficient analysis and storage of large datasets. 
  • More reliable: A trusted source! SRA is a trustworthy database, and we are continuously improving our processes to ensure system reliability.   
  • And more!  

Continue reading “3+ Ways NCBI is Enhancing the SRA Database”

GenBank Release 254.0 is Available!

GenBank Release 254.0 is Available!

GenBank release 254.0 (2/19/2022) is now available on the NCBI FTP site. This release has 22.52 trillion bases and 3.37 billion records. The current release has 241,830,635 traditional records containing 1,731,302,248,418 base pairs of sequence data. There are also 2,337,838,461 WGS records containing 20,116,642,176,263 base pairs of sequence data, 672,261,981 bulk-oriented TSA records containing 630,615,054,587 base pairs of sequence data, and 121,067,644 bulk-oriented TLS records containing 46,465,508,548 base pairs of sequence data. Continue reading “GenBank Release 254.0 is Available!”
New wizard for submitting mRNA sequences to GenBank

New wizard for submitting mRNA sequences to GenBank

Do you submit eukaryotic nuclear mRNA sequences to GenBank? A new mRNA submission wizard is available! Built on the modern Submission Portal framework, this new wizard will bring you an enhanced experience, including:  

    • Guided submission experience specific for mRNA sequences 
    • Automated trimming of vector and removal of short sequences  
    • Easier input for source metadata 
    • New feature annotation web forms for coding region (CDS) and untranslated region (5’ UTR, 3’ UTR)  
    • Extensive feature previews (Figure 1) 
    • Faster sequence processing and accession assignment  
    • Access to a fix error workflow prior to accession assignment 

Watch a short video (4 min) to see how to annotate CDS features in this new wizard!  Continue reading “New wizard for submitting mRNA sequences to GenBank”

Announcing the GenBank and SRA Data Processing Webpage

Announcing the GenBank and SRA Data Processing Webpage

Interested in understanding how sequence data are submitted, processed, and made publicly available in GenBank and the Sequence Read Archive (SRA)? Announcing the GenBank and SRA Data Processing webpage!

Here you can learn about procedures that the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), uses for processing submitted data and public posting, as well as key definitions of data status. Continue reading “Announcing the GenBank and SRA Data Processing Webpage”

Join NCBI at PAG 30

Join NCBI at PAG 30

San Diego, January 13-18, 2023 

NCBI is looking forward to seeing you in person at the International Plant and Animal Genome Conference (PAG 30), January 13-18, 2023 in San Diego, California.  

We’re especially excited to share our recent efforts on the NIH Comparative Genomics Resource (CGR), a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources on biomedical research.  

We also want to hear from you! If you’re interested in sharing your feedback on your needs and experiences involving comparative genomics tools to inform CGR, consider joining our Feedback Session.

Check out NCBI’s schedule of activities and events:  

Continue reading “Join NCBI at PAG 30”

Announcing GenBank release 252.0

Announcing GenBank release 252.0

Now over 3 billion records!

GenBank release 252.0 (10/17/2022) is now available on the NCBI FTP site. This release has 20.35 trillion bases and 3.10 billion records. The current release has 240,539,282 traditional records containing 1,562,963,366,851 base pairs of sequence data. There are also 2,167,900,306 WGS records containing 18,231,960,808,828 base pairs of sequence data, 574,020,080 bulk-oriented TSA records containing 511,476,787,957 base pairs of sequence data, and 115,123,306 bulk-oriented TLS records containing 43,860,512,749 base pairs of sequence data. 

Continue reading “Announcing GenBank release 252.0”

Announcing GenBank Release 251.0

Announcing GenBank Release 251.0

GenBank release 251.0 (8/15/2022) is now available on the NCBI FTP site. This release has 19.55 trillion bases and 2.94 billion records. The current release has 239,915,786 traditional records containing 1,492,800,704,497 base pairs of sequence data. There are also 2,024,099,677 WGS records containing 17,511,809,676,629 base pairs of sequence data, 560,196,830 bulk-oriented TSA records containing 497,501,380,386 base pairs of sequence data, and 115,103,527 bulk-oriented TLS records containing 43,852,280,645 base pairs of sequence data. 

Continue reading “Announcing GenBank Release 251.0”

Foreign Contamination Screen (FCS) tool for GenBank submissions

Foreign Contamination Screen (FCS) tool for GenBank submissions

We are excited to introduce a Foreign Contamination Screen (FCS) tool that you can now run yourself, with enhanced contaminant detection sensitivity to improve your genome assemblies and facilitate high-quality data submissions to GenBank. If you submit genome assembly data to GenBank, the FCS tool is for you!

What is the FCS tool?

FCS, a quality assurance process used to make data suitable for submission, consists of two parts: FCS-adaptor and FCS-GX. FCS-adaptor searches for short sequences that are used as part of the lab preparation process and sometimes wind up in the final assembly by mistake. FCS-GX searches for sequences from a wide range of organisms including bacteria, fungi, protists, viruses, and others to identify sequences that don’t look like they are from the intended organism. In each case, you receive a report of the coordinates and identities of potential contaminants to be reviewed and removed (see Figure 1 for a sample report of the FCS-GX summary output). Both tools are designed to screen both eukaryote and prokaryote genomes.

Figure 1. FCS-GX report showing the summary of contamination identified in a tomato genome. The output indicates there are 83 sequences, adding up to 381 kb total length, to be removed from a mix of insect, fungal, and bacterial sources.

How do I use FCS?

FCS is available from GitHub. Simply download the two programs (FCS-adaptor and FCS-GX), and follow a few steps as outlined in the Quickstart. Both tools are also easy and inexpensive to run on commercial clouds such as Amazon Web Services (AWS) or Google Cloud Platform (GCP), and can screen genomes in a fraction of the time of other approaches. 

Why is FCS important?

Having high quality data available for analysis is necessary in order to arrive at accurate conclusions during research. With FCS, rapid detection of contaminants from foreign organisms in assembled genomes ensures that high value data is being provided for submission and available for reuse. We’ve already used FCS-GX to remove over one hundred megabases of contaminants and thousands of erroneous genes and proteins from previously submitted eukaryote genomes to make the data more useful for all. 

We want to hear from you!

We will update the FCS tool based on your feedback, so try it out and let us know what you think. Please contact us with comments and suggestions.

FCS is part of the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms.

Join our mailing list to keep up to date with FCS and other CGR news.