Month: April 2020

Canonical SPDI notation now in ClinVar

Did you know that you can see canonical SPDI notation – SPDI notation expressed on the GRCh38 chromosomal sequence – in ClinVar?

Figure 1. The canonical SPDI is provided within the “Variant details” tab. This is just one of many ways to see the notation.

This allows you to easily make connections between output from NCBI’s Variation Services and ClinVar data.

Continue reading “Canonical SPDI notation now in ClinVar”

New feature added to Primer-BLAST to better design primers for expression assays

We’ve added a new feature (Max 3′ match), shown in Figure 1, to Primer-BLAST that limits the length of 3′ exon matches when designing exon-exon spanning primers. This makes it less likely that primers specifically designed to amplify transcripts will also amplify genomic DNA contamination in expression assays.


Figure 1. The new “Max 3′ match” option that limits the size of the 3′ match for exon-exon junction primers. This option helps avoid primers that may also produce product from genomic DNA. Continue reading “New feature added to Primer-BLAST to better design primers for expression assays”

Flies Are A-buzzing in RefSeq!

Are you interested in comparative genomics or other studies using Drosophila genomics?

Then don’t miss our online poster #568A at TAGC 2020 Online (no meeting registration required). Also, tune in to the online Q&A session on Monday, April 27 at 12:00 – 12:30 pm EDT.

What’s happening? In coordination with FlyBase, we are transitioning almost all of the RefSeq Drosophila assemblies to annotation produced primarily by NCBI’s eukaryotic genome annotation pipeline. We’ll continue to use the FlyBase annotation for Drosophila melanogaster (soon to be updated to Release 6.32), but we’ll annotate the other species using available RNA-seq datasets and our latest software. This will allow us to provide consistent, high-quality annotations across the full spectrum of Drosophila species, and also rapidly provide annotations as new high-quality assemblies become available. Another benefit is that these annotations will be available in the full suite of NCBI resources, including nucleotide, protein, BLAST, GeneGenome Data Viewer, Genomes, Assembly, and more. You can download these annotation data from the NCBI genomes FTP site or you can try the new NCBI Datasets tool. By special request, we’re making orthology data relative to D. melanogaster available on the Gene FTP site, and plan to expose that data in our public pages in the future.

Continue reading “Flies Are A-buzzing in RefSeq!”

Recalculation of prokaryotic reference and representative genome assemblies

We have updated the collection of representative and reference assemblies for Bacteria and Archaea to better reflect the taxonomic breadth of the prokaryotes in RefSeq.  We chose the 11,478 representative assemblies in the new collection from the 180,000+ prokaryotic assemblies in RefSeq today.  We have selected one representative or reference assembly for every species based on several criteria including contiguity, completeness and whether the assembly is from type material.  We have also updated the reference and representative microbial Blast database to reflect these changes. This reference and representative set will be updated three times a year to reflect changes in RefSeq.  In addition, as we announced on Feb 14, we have reduced the number of reference genome assemblies — the subset of representative assemblies with annotation provided by outside experts —  to 15. See the list in our previous post .  We have re-annotated the 104 assemblies that are no longer reference with or Prokaryotic Genome Annotations Pipel (PGAP).

Recent RefSeq annotations: barn owl, monarch butterfly and more

800px-Barn_Owl,_Manchester_area,_UK,_by_Andy_Chilton_2016-07-06_(Unsplash)In February and March, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Amblyraja radiata (thorny skate)
  • Catharus ustulatus (Swainson’s thrush)
  • Chelonoidis abingdonii (Abingdon island giant tortoise)
  • Chiroxiphia lanceolata (lance-tailed manakin)
  • Danaus plexippus plexippus (monarch butterfly)
  • Daphnia magna (crustacean)
  • Drosophila grimshawi (fly)
  • Drosophila mojavensis (fly)
  • Drosophila sechellia (fly)
  • Homo sapiens (human)
  • Hylobates moloch (silvery gibbon)
  • Lontra canadensis (Northern American river otter)
  • Lynx canadensis (Canada lynx)
  • Nasonia vitripennis (jewel wasp)
  • Odontomachus brunneus (ant)
  • Petromyzon marinus (sea lamprey)
  • Phocoena sinus (vaquita)
  • Rattus rattus (black rat)
  • Rhinolophus ferrumequinum (greater horseshoe bat)
  • Strigops habroptila (Kakapo)
  • Taeniopygia guttata (zebra finch)
  • Tyto alba (Barn owl)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Streamlined submission of SARS-CoV-2 data with rapid turnaround

sars-cov-2 submission landing page

Figure 1. The SARS-CoV-2 submission landing page, where you can submit to GenBank or SRA. You can also view other resources related to SARS-CoV-2.

Quickly and easily add your SARS-CoV-2 sequence data to the growing public archive with new, special features and support from NCBI. Our new SARS-CoV-2 sequence submission landing page will help you get started. GenBank submissions are accessioned and released in approximately 1-2 working days, and Sequence Read Archive (SRA) submissions typically processed and released within hours. Submission is simple!

Continue reading “Streamlined submission of SARS-CoV-2 data with rapid turnaround”

April 22 Webinar on NCBI’s ALFA: allele frequency data for variant analysis and interpretation

April 22 Webinar on NCBI’s ALFA: allele frequency data for variant analysis and interpretation

On Wednesday, April 22, 2020 at 12 PM,  join NCBI staff to learn how results from the Allele Frequency Aggregator (ALFA) project will help you interpret the biological impact of common and rare sequence variants. ALFA’s initial release includes analysis of genotype data from ~100K unrestricted dbGaP subjects and provides high-quality allele frequency data now displayed on relevant dbSNP records. In this webinar, you will learn about the data in the recent ALFA release, see how to access the data from the web, FTP, and how to programmatically retrieve data by positions, genes, and other attributes using E-utilities and Variation Services API in Python.

  • Date and time: Wed, Apr 22, 2020 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The next RefSeq FTP release number will skip to 200

NCBI’s Reference Sequence (RefSeq) FTP release numbers will increment to 200 for the next release and skip over the numbers 100-199. The current, March 2020 release, is release 99. The next bi-monthly release in May 2020 will be release 200.  This change is to avoid overlapping with the release numbers of the completely independent RefSeq annotation releases for the eukaryotic genomes we annotate, which are currently in the range 100-109, for example Mus musculus Annotation Release 108. Continue reading “The next RefSeq FTP release number will skip to 200”