Do you have Norovirus sequence data to submit to GenBank? Try out the newly-released improvements in our submission service for Norovirus data! The new service offers the following advantages:
- Faster processing and shorter time to accession numbers
- Improved user interface
- Automatic Feature annotation
Figure 1. The submission portal page showing the new option for submitting Norovirus data.
Begin a new Norovirus submission or see how to get started submitting other data to GenBank.
GenBank accepts a wide range of data to support scientific discovery and analysis on sequences from all branches of life.
As we announced in October, a new My Bibliography is coming soon! We encourage you to preview the new service and let us know what you think. You can safely try out this new version without affecting anything in your existing My Bibliography.
With an all new look and feel, it will be easier than ever to manage and share your work. While the clean, new interface will make managing your collection a breeze, the new pages layout will make it easier to manage very large bibliographies. You’ll also be able to search within your bibliography for keywords, author names, and grant numbers to quickly filter your view to only the most relevant citations for you.
NCBI announces Annotation Release 100 of the Pacific white shrimp (Penaeus vannamei) genome in RefSeq, based on the assembly (GCF_003789085.1) submitted by the Institute of Oceanology, Chinese Academy of Sciences. The Pacific white shrimp is one of the most important shrimp species in fisheries and aquaculture and represents the first decapod to have its genome annotated by NCBI. We predicted 24,987 protein coding genes with evidence from alignment of six billion RNA-Seq reads and homology with invertebrate proteins. This annotation will enable genomic research in this commercially important species.
You can download the annotated assembly or browse and search it in the Genome Data Viewer.
Please visit our Eukaryotic RefSeq Genome Annotation Status page to see more annotations in progress.
GenBank release 229.0 (12/15/2018) has 211,281,415 traditional records including non-bulk-oriented TSA) containing 285,688,542,186 base pairs of sequence data. There are also 773,773,190 WGS records containing 3,656,719,423,096 base pairs of sequence data, 274,845,473 bulk-oriented TSA records containing 248,592,892,188 base pairs of sequence data, and 20,924,588 bulk-oriented TLS records containing 8,511,829,281 base pairs of sequence data.
As the final part of an announced FTP restructuring effort, we will remove the old PMC Bulk Open Access files on March 18, 2019 – but please note that the data is still available.
In August 2016, the PMC Open Access dataset was split into two groups: Continue reading
This May, the NCBI will host a women’s collaborative biodata science hackathon on the NIH Campus in Bethesda, Maryland!
We are now collecting project proposals focusing on building tools and pipelines for advanced analysis of biomedical datasets including text, images, next generation sequencing data, proteomics, and metadata. Proposals for tutorial pipelines and educational tools for advanced analysis are also welcome. Submit your project proposal by March 4, 2019.
NCBI has been asked to take over the ownership and maintenance of the TIGRFAM collection of Hidden Markov Models (HMMs), which is widely used for the annotation of prokaryotic genomes. The TIGRFAMs are a collection of curated protein families started in 1998 at The Institute of Genomic Research (TIGR), precursor to the J. Craig Venter Institute (JCVI). This collection is publicly available under a Creative Commons license and downloadable from NCBI. We have already made hundreds of improvements to TIGRFAM names and descriptions and we will continue to make regular updates.\
We’ve recently improved the tooltips for gene features in NCBI’s graphical sequence displays in Genome Data Viewer (GDV) and on many resource pages, such as Gene and dbSNP. These enhancements include quick details and helpful links about the feature and gene.
Figure 1. Merged transcript and CDS pair tooltip.
We are pleased to announce the first ever pangenomics, graphs and haplotypes hackathon.
From March 25-27, 2019, the NCBI will help run a bioinformatics hackathon in Santa Cruz, California, hosted by the University of California, Santa Cruz (UCSC). Potential topics include:
- Building large scale graphs from pangenomes using several assembly methods
- Simplification of mapping
- Resolving haplotypes
- Identification of population-specific structural variants
- Defining haplotype-specific expression, visualization, and coordination with the GRC
To ensure that taxonomic information on genome assemblies is as accurate as possible, NCBI will use average nucleotide identity (ANI) analysis to correct existing public records in GenBank.
We will contact submitters of records found to be misidentified and provide reports with ANI information based on comparison to type strains. If there is no objection, the taxonomic change will be made, and a structured comment will be added to the record.
In cases where a genome assembly was not submitted with a binomial name (ex: Bacillus sp. 123) but was found to match a known species with high confidence, the strain will be merged with the binomial in the taxonomy database. This will occur as part of the normal maintenance of merged taxonomic names. The submitter will not be contacted, but the structured comment indicating the change will be added to the record.
A paper in the International Journal of Systematic and Evolutionary Microbiology presents the method NCBI scientists used to review all prokaryotic genome assemblies in GenBank, as well as the current status of GenBank verifications and recent developments in confirming species assignments in new genome submissions.