Tag: RefSeq

Variation feature changes in NCBI Reference Sequences coming in 2018

Starting in March 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records. Because the number of variants is already enormous and still growing, removing SNP features from these large genomic records will significantly reduce the size of RefSeq FTP files and make downloading and processing easier. We will continue to include SNPs on NG_-prefixed genomic records, and transcript (NM_, NR_, XM_, XR_) and protein (NP_, XP_, YP_) sequences.

Reminder: As of September 2017, NCBI has stopped accepting submissions for non-human SNPs in dbSNP and dbVar. RefSeq flatfiles will stop presenting non-human variant data in November 2017.

Subscribe to the refseq-announce listserv for regular updates on RefSeq.

November 1 webinar: Introducing the Genome Data Viewer (GDV)

November 1 webinar: Introducing the Genome Data Viewer (GDV)

On Wednesday, November 1, 2017, we will present a webinar on GDV, NCBI’s full-featured genome browser. In this webinar, you’ll learn how to explore and analyze sequences and annotations for eukaryotic RefSeq genome assemblies. We’ll show you how to:

  • Search across the entire assembly for genes, products and other markers or jump to a specific position or range
  • Display any of seven preselected track sets highlighting various aspects of the assembly or create and load your own custom track sets from your NCBI account.
  • Load and display submitted alignment data from NCBI’s GEO or SRA.
  • Upload your own annotation and variant data
  • Display BLAST or Primer-BLAST results on the assembly in the browser.

Date and time: Wednesday, November 1, 2017 12:00-12:30PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Updated HIV-1 interaction datasets in Gene

Updated HIV-1 interaction datasets in Gene

We recently updated the HIV-1 interaction datasets in Gene with data provided by the Southern Research Institute (SRI).

The protein interactions dataset now has:

  • 8,005 interactions,
  • 16,215 interaction descriptions,
  • 3,859 proteins encoded by 3,757 human genes,
  • and 6,822 publications.

The replication interactions dataset now has:

  • 1,595 interactions,
  • 1,854 interaction descriptions,
  • 1,583 proteins encoded by 1,583 human genes,
  • and 229 publications.

Data are also available at the RefSeq HIV-1 website and the GeneRIF FTP site.

October 11 NCBI Minute: Introducing the New RefSeq Functional Elements Project

October 11 NCBI Minute: Introducing the New RefSeq Functional Elements Project

On October 11, 2017, NCBI will present a webinar on RefSeq Functional Elements. This NCBI Minute will introduce you to this project and its scope, describe how functional elements are curated and displayed, demonstrate how to access the data, and provide information on the current progress of the project.

Date and time: Wed, Oct 11, 2017 12:00 PM – 12:30 PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The new RefSeq Functional Elements project is an expansion of the NCBI RefSeq project to include non-genic functional genomic regions in human and mouse that have been experimentally validated and described in the scientific literature.

RefSeq release 84 available

RefSeq release 84 available

RefSeq release 84 is now accessible online, via FTP and through NCBI’s programming utilities.

This full release incorporates genomic, transcript, and protein data available, as of September 11, 2017, and contains 140,627,690 records, including 95,563,598 proteins, 20,356,598 RNAs, and sequences from 72,965 organisms.

The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.

Phasing out support for non-human organisms

As of September 1, 2017, the dbSNP and dbVar databases have stopped accepting submissions for non-human organisms. Submissions for non-human variation will now be accepted by the European Variation Archive, one of our partners in the International Nucleotide Sequence Database (INSDC).

Yellow fever mosquito, 6 other organisms in July RefSeq genome annotations

Yellow fever mosquito, 6 other organisms in July RefSeq genome annotations

In July, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Papio anubis (olive baboon)
  • Prunus avium (sweet cherry)
  • Aedes aegypti (yellow fever mosquito)
  • Chenopodium quinoa (quinoa)
  • Hevea brasiliensis (a eudicot)
  • Manihot esculenta (cassava)
  • Carlito syrichta (Philippine tarsier)
Portrait of olive baboon
Papio anubis (olive or anubis baboon)
Source: United States Fish and Wildlife Service: Digital Library System

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

NCBI Replacing Obsolete NCBI Genomes (chromosome) and Removing Human ALU repeat elements (alu_repeats) BLAST databases

NCBI will discontinue both the NCBI Genomes (chromosome) and the Human ALU repeat elements (alu_repeats) BLAST databases in October 2017.

Better alternatives to NCBI Genomes (chromosome)

The existing NCBI Genomes (chromosome) database does not offer complete and non-redundant coverage of genome data. The newly added NCBI RefSeq Genomes Database (refseq_genomes) and the RefSeq Representative Genomes Database (refseq_representative_genomes) are more useful alternatives to the chromosome database. You can select these databases from the database pull-down list on any general BLAST form that searches a nucleotide database (blastn, tblastn).

nucleotide-nucleotide BLAST database menu
Figure 1. The nucleotide-nucleotide BLAST database menu with the recommended (RefSeq Genome and Representative genomes) and deprecated (NCBI genomes (chromosomes) and Human ALU repeats) databases highlighted.

Continue reading “NCBI Replacing Obsolete NCBI Genomes (chromosome) and Removing Human ALU repeat elements (alu_repeats) BLAST databases”

Identical Protein Groups: Non-redundant access to protein records

Have you ever searched the NCBI Protein database and been overwhelmed with the number of sequences returned? Have you tried searching with a protein name, thinking that would greatly limit the results, only to still be presented with many sequences (all with the same name)? It’s a common problem in this time of greatly expanding sequence databases powered by large-scale genomic sequencing of similar organisms. Redundancy in the sequence databases is high and only getting worse.

To address this, in 2013 NCBI released the WP records, which collect identical protein sequences annotated on bacterial genomes. In 2014, NCBI released the Identical Protein Reports on Protein records, which displays information about all other proteins identical to that protein. Now, we are releasing a new resource: Identical Protein Groups (IPG).  IPG offers several features:

Continue reading “Identical Protein Groups: Non-redundant access to protein records”

RefSeq release 83 now public

RefSeq release 83 now public

RefSeq release 83 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available as of July 17, 2017, and contains 132,052,465 records, including 88,385,530 proteins, 19,634,664 RNAs, and sequences from 71,356 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. More information about RefSeq release 83 is available in the release notes.

Future changes

NCBI will phase out support for non-human organisms in the dbSNP and dbVar databases. These databases will stop accepting submissions for non-human SNPs in September 2017. The interactive websites for these databases and related NCBI services, including RefSeq flatfiles, will stop presenting non-human variant data in November 2017.

Zebrafish (Danio rerio), 11 other organisms in June RefSeq genome annotations

Zebrafish (Danio rerio), 11 other organisms in June RefSeq genome annotations

In June, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms, including Danio rerio (zebrafish):

Continue reading “Zebrafish (Danio rerio), 11 other organisms in June RefSeq genome annotations”