We recently updated the HIV-1 interaction datasets in Gene with data provided by the Southern Research Institute (SRI).
The protein interactions dataset now has:
- 8,005 interactions,
- 16,215 interaction descriptions,
- 3,859 proteins encoded by 3,757 human genes,
- and 6,822 publications.
The replication interactions dataset now has:
- 1,595 interactions,
- 1,854 interaction descriptions,
- 1,583 proteins encoded by 1,583 human genes,
- and 229 publications.
Data are also available at the RefSeq HIV-1 website and the GeneRIF FTP site.
On October 11, 2017, NCBI will present a webinar on RefSeq Functional Elements. This NCBI Minute will introduce you to this project and its scope, describe how functional elements are curated and displayed, demonstrate how to access the data, and provide information on the current progress of the project.
Date and time: Wed, Oct 11, 2017 12:00 PM – 12:30 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
The new RefSeq Functional Elements project is an expansion of the NCBI RefSeq project to include non-genic functional genomic regions in human and mouse that have been experimentally validated and described in the scientific literature.
RefSeq release 84 is now accessible online, via FTP and through NCBI’s programming utilities.
This full release incorporates genomic, transcript, and protein data available, as of September 11, 2017, and contains 140,627,690 records, including 95,563,598 proteins, 20,356,598 RNAs, and sequences from 72,965 organisms.
The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.
Phasing out support for non-human organisms
As of September 1, 2017, the dbSNP and dbVar databases have stopped accepting submissions for non-human organisms. Submissions for non-human variation will now be accepted by the European Variation Archive, one of our partners in the International Nucleotide Sequence Database (INSDC).
In July, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:
- Papio anubis (olive baboon)
- Prunus avium (sweet cherry)
- Aedes aegypti (yellow fever mosquito)
- Chenopodium quinoa (quinoa)
- Hevea brasiliensis (a eudicot)
- Manihot esculenta (cassava)
- Carlito syrichta (Philippine tarsier)
Papio anubis (olive or anubis baboon)
Source: United States Fish and Wildlife Service: Digital Library System
See more details on the Eukaryotic RefSeq Genome Annotation Status page.
NCBI will discontinue both the NCBI Genomes (chromosome) and the Human ALU repeat elements (alu_repeats) BLAST databases in October 2017.
Better alternatives to NCBI Genomes (chromosome)
The existing NCBI Genomes (chromosome) database does not offer complete and non-redundant coverage of genome data. The newly added NCBI RefSeq Genomes Database (refseq_genomes) and the RefSeq Representative Genomes Database (refseq_representative_genomes) are more useful alternatives to the chromosome database. You can select these databases from the database pull-down list on any general BLAST form that searches a nucleotide database (blastn, tblastn).
Figure 1. The nucleotide-nucleotide BLAST database menu with the recommended (RefSeq Genome and Representative genomes) and deprecated (NCBI genomes (chromosomes) and Human ALU repeats) databases highlighted.
Have you ever searched the NCBI Protein database and been overwhelmed with the number of sequences returned? Have you tried searching with a protein name, thinking that would greatly limit the results, only to still be presented with many sequences (all with the same name)? It’s a common problem in this time of greatly expanding sequence databases powered by large-scale genomic sequencing of similar organisms. Redundancy in the sequence databases is high and only getting worse.
To address this, in 2013 NCBI released the WP records, which collect identical protein sequences annotated on bacterial genomes. In 2014, NCBI released the Identical Protein Reports on Protein records, which displays information about all other proteins identical to that protein. Now, we are releasing a new resource: Identical Protein Groups (IPG). IPG offers several features:
RefSeq release 83 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available as of July 17, 2017, and contains 132,052,465 records, including 88,385,530 proteins, 19,634,664 RNAs, and sequences from 71,356 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. More information about RefSeq release 83 is available in the release notes.
NCBI will phase out support for non-human organisms in the dbSNP and dbVar databases. These databases will stop accepting submissions for non-human SNPs in September 2017. The interactive websites for these databases and related NCBI services, including RefSeq flatfiles, will stop presenting non-human variant data in November 2017.
In June, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms, including Danio rerio (zebrafish):
NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. Data can be accessed via Gene, Nucleotide, BLAST, BioProject, Graphical Displays and FTP.
Figure 1. Phascolarctos cinereus (koala)
In May, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms: