Month: December 2019

GenBank release 235

GenBank release 235

GenBank release 235.0 (12/11/2019) is now available on the NCBI FTP site. This release has 7 trillion bases and 1.74 billion records.

The current release has 215,333,020 traditional records containing 388,417,258,009 base pairs of sequence data. There are also 1,127,023,870 WGS records containing 6,277,551,200,690 base pairs of sequence data, 367,193,844 bulk-oriented TSA records containing 325,433,016,129 base pairs of sequence data, and 28,227,180 bulk-oriented TLS records containing 11,280,596,614 base pairs of sequence data.

Continue reading “GenBank release 235”

Mitochondrial COX1 submission improvements now live in submission portal!

GenBank submitters, you can now submit mitochondrial COX1 (cytochrome oxidase subunit I; COI) sequence data from multicellular animals (metazoa) using a new workflow (Figure 1) with an improved interface, enhanced validation, and automatic COX1 CDS feature annotation.  Once you have submitted mitochondrial COX1 data using this tool, you’ll have a single, helpful page to reference your submission information: accession number(s), COX1 submission status, relevant files and more. Plus, you can also fix any errors from this page.

COX1_Submission2
Figure 1. Submission Portal page with the mitochondrial COX1 submission option selected (boxed in red).  The service has options for other targeted submissions including ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.

Continue reading “Mitochondrial COX1 submission improvements now live in submission portal!”

ClinVar Celebrates 1 Million Submissions

ClinVar Celebrates 1 Million Submissions

1M

 

Text: 1 million submitted records in ClinVar represent more than 568,000 unique variants ClinVar is proud to announce the submission of the one millionth record to its database.

The millionth submission was published on Friday, December 20, 2019, a milestone achievement for providing open access to human variant data with asserted consequence to the clinical genetics and research communities.

ClinVar extends its thanks to the many laboratories, partners, and members of the community whose efforts and adoption of the practice of data-sharing paved the way for this achievement. All organizations that contributed to ClinVar’s genetics resources share in this accomplishment, with special recognition reserved for ClinGen and several of their members, including EGL Genetic Diagnostics/Eurofins Clinical Diagnostics, GeneDx, Invitae, and Laboratory for Molecular Medicine/Partners HealthCare Personalized Medicine, whose early submissions helped jump-start ClinVar’s database.

Continue reading “ClinVar Celebrates 1 Million Submissions”

BLAST+ 2.10.0 now available with improved composition-based statistics

The BLAST+ 2.10.0 release is now available from our FTP site.  The new version offers the following improvements:

  • updated composition-based statistics for protein-protein (including translated BLAST) comparisons to provide stable results when you request fewer than the default number of results
  • an experimental Adaptive Composition Based Statistics option that increases the likelihood of finding novel results.  To enable this option set the environment variable ADAPTIVE_CBS to 1.  We welcome your feedback on this new option.

See the release notes for details on more  improvements and bug fixes with this release.

The new version fully supports the version 5 (v5) databases with built in taxonomy and other improvements. For more information on v5 databases (download), see the previous NCBI Insights article and the recording of our webinar.  If you are still using the older version 4 (v4) databases, we recommend you begin using the v5 version as soon as possible.  We will discontinue updates to the older v4 databases in early 2020.

Genome Workbench is now in the cloud!

If you’re interested in visualizing and analyzing genomic data, then you’ll want to check out a new way to run Genome Workbench: in the cloud! Genome Workbench is a desktop application (both Windows and Mac) that lets you analyze genomic data in one place. You can run tools such as BLAST and create views such as multiple sequence alignments, and much more. You can run Genome Workbench on a cloud environment from your local desktop computer. This manual will show you how.

blog-525_Cloud Graphic

There are many advantages to using Genome Workbench in the cloud:

  • You can easily compare your data to the complete GenBank and RefSeq datasets without needing to download them
  • You can run BLAST searches against standard databases or any custom databases you’ve assembled in the cloud
  • All of the data (e.g. FASTA, BAM, GFF files) remain in the cloud with no need for local copies
  • You won’t pay egress fees for downloading data

Give it a try and let us know how it goes!

Feature propagation in BankIt: easily annotate many sequences at once for GenBank submission

Do you need a quick way to annotate features on a similar set of sequences for your GenBank submission? You can now submit sequences from the same region or gene in an alignment format in BankIt and use the new ‘Feature propagation option’ (Figure 1) to apply features from a single sequence to other aligned sequences. You simply annotate one sequence and then copy that annotation across all the sequences in your submission.

Here’s how you can propagate features in three easy steps:

  1. Provide nucleotide sequences in an alignment format.
  2. Select a sequence and annotate it.
  3. Propagate the features and edit results.

Continue reading “Feature propagation in BankIt: easily annotate many sequences at once for GenBank submission”

New download files and FTP directories for genome assemblies

You can now download new file types for species recently annotated by the NCBI Eukaryotic Genome Annotation Pipeline from the Assembly web pages and from the genomes/refseq FTP area. The new files types include alignments of annotated transcripts to the assembly in BAM format, all models predicted by Gnomon, and — for species that have been annotated multiple times —  files characterizing the feature-by-feature differences between the current and the previous annotation.

Continue reading “New download files and FTP directories for genome assemblies”

Coming Soon: A New NIH Manuscript Submission (NIHMS) System!

Reflecting the National Library of Medicine’s (NLM) ongoing commitment to public access support at the National Institutes of Health (NIH) and beyond, we are pleased to announce that a new NIHMS system will be released in early 2020. This new system aims to streamline the submission process, ensure the continued quality of manuscripts made publicly accessible, and give authors and investigators more transparent options for avoiding processing delays.

Those familiar with the current NIHMS system will find the basic steps of submitting, reviewing, and approving manuscripts for inclusion in PMC unchanged in the new system. They will see an updated user interface that simplifies the login process for returning users; provides contextual help throughout; and offers user-friendly options for importing article metadata, requesting corrections, and taking over the Reviewer role for stalled submissions. Details of these updates and more are available in this video:

Continue reading “Coming Soon: A New NIH Manuscript Submission (NIHMS) System!”