Author: NCBI Staff

GenBank release 225: Over 1 billion sequence records stored!

GenBank release 225: Over 1 billion sequence records stored!

GenBank release 225.0 (4/14/2018) has 208,452,303 traditional records (including non-bulk-oriented TSA) containing 260,189,141,631 base pairs of sequence data. In addition, there are 621,379,029 WGS records containing 2,784,740,996,536 base pairs of sequence data, 227,364,990 TSA records containing 205,232,396,043 base pairs of sequence data, and 14,782,654 TLS records containing 5,612,769,448 base pairs of sequence data.

During the 60 days between the close dates for GenBank releases 224.0 and 225.0, the traditional portion of GenBank grew by 6,558,433,533 base pairs and by 1,411,748 sequence records. During that same period, 86,960 records were updated – an average of 24,978 records added or updated per day.

Continue reading “GenBank release 225: Over 1 billion sequence records stored!”

May 16 webinar: Improved Standalone BLAST database and programs: now with taxonomic information

May 16 webinar: Improved Standalone BLAST database and programs: now with taxonomic information

Next Wednesday, May 16, 2018, we’ll show you how to download and use the latest standalone BLAST databases, dbv5. You’ll learn how to use BLASTdbv5 and the new BLAST programs to limit searches to taxonomic groups and to retrieve sequences from the database by taxonomy.

Date and time: Wed, May 16, 2018 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2qW7LLy

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

May 9 NCBI Minute: Integrating PubChem into Your Chemistry Teaching

May 9 NCBI Minute: Integrating PubChem into Your Chemistry Teaching

Next Wednesday, May 9, 2018, NCBI staff will show you how to use PubChem as a cheminformatics education resource. In addition to learning about tools and services for chemical information search, analysis, and download, you will also see examples of how instructors incorporate PubChem in Cheminformatics OLCC (On-Line Chemistry Courses), an intercollegiate hybrid course.

Date and time: Wednesday, May 9, 2018 12:00 – 12:30 PM EDT

Register here: https://bit.ly/2q5wtsF

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

March & April annotations in RefSeq: chimpanzee, human & more

Chimpanzees_in_Uganda_(5984913059)The NCBI Eukaryotic Genome Annotation Pipeline has recently released new annotations in RefSeq for the following organisms:

  • Bombus impatiens (common eastern bumble bee)
  • Brachypodium distachyon (stiff brome)
  • Cimex lectularius (bed bug)
  • Desmodus rotundus (common vampire bat)
  • Halyomorpha halys (brown marmorated stink bug)
  • Homo sapiens (human, more information can be found here)
  • Lingula anatina (brachiopod)
  • Neophocaena asiaeorientalis asiaeorientalis (Yangtze finless porpoise)
  • Oncorhynchus tshawytscha (Chinook salmon)
  • Oryzias melastigma (Indian medaka)
  • Pan troglodytes (chimpanzee)
  • Physcomitrella patens (moss)
  • Populus trichocarpa (black cottonwood)
  • Rosa chinensis (China rose)
  • Selaginella moellendorffii (club-moss)
  • Terrapene mexicana triunguis (Three-toed box turtle)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

The NCBI BioCollections database links specimen vouchers and sequence records to home institutions

The NCBI BioCollections database links specimen vouchers and sequence records to home institutions

A paper in the January 2018 issue of Database describes the NCBI BioCollections database, a curated dataset of metadata for culture collections, museums, herbaria and other natural history collections connected to sequence records in GenBank. The BioCollections database was established to allow the association of specimen vouchers and related sequence records to their home institutions. This process also allows back-linking from the home institution for quick identification of all records originating from each collection.

The rapidly growing set of GenBank submissions frequently includes records that are derived from specimen vouchers.  Correct identification of the specimens studied, along with a method to associate the sample with its institution, is critical to the outcome of related studies and analyses.

New repository records are added to the database if they are submitted to the International Nucleotide Sequence Database Collaboration (INSDC) along with sequence data. Each record now provides information about the institution that houses the collection, standard Institution Code, mailing address, and associated webpage if available.

The BioCollections database is maintained and curated by the Taxonomy group at NCBI.

Researchers: Now it’s easier to find the data you want in BioProject

We’ve improved BioProject to give you a better way to find all data from a specific project. We think you’ll love the new interface that lets you quickly choose the right BioProject with links to the data you want in other NCBI databases.

The updated BioProject browser makes it easier than ever to filter the data by a variety of attributes so you can quickly pick BioProjects that interest you.

fig 1
Figure 1. The BioProject home page showing links to the BioProject browser. To use the new browser, click the ‘Browse by Project Attributes link below the search bar on any BioProject page or the ‘By Project attributes’ link on the BioProject home page.

Continue reading “Researchers: Now it’s easier to find the data you want in BioProject”

Human annotation release 109 for GRCh38.p12 is available in RefSeq

Human annotation release 109 for GRCh38.p12 is available in RefSeq

You can now download human annotation release 109 on FTP or explore it in the Genome Data Viewer, in the Gene database, and with BLAST.

Highlights in release 109:

  • A total of 20,203 protein-coding genes and 17,871 non-coding genes were annotated.
  • The number of annotated curated transcripts increased by 17% and genes with two or more curated alternative variants increased by 8%.
  • The annotation includes 6,862 features and 2,075 GeneIDs for non-genic functional elements, such as regulatory regions and known structural elements. For example, see the opsin locus control region (OPSIN-LCR).

Continue reading “Human annotation release 109 for GRCh38.p12 is available in RefSeq”

NCBI retires Map Viewer web interface

NCBI retires Map Viewer web interface

On October 24, 2017, we announced the replacement of NCBI’s Map Viewer with the Genome Data Viewer (GDV) . As described in that announcement, the Map Viewer web interface will be removed in one week on May 2, 2018. Map Viewer links will be redirected to the GDV home page. Map Viewer static data will remain on the NCBI FTP site. Please review details related to the FTP content in our February announcement.

Please contact us with any comments and concerns, or if you need more help with the transition from Map Viewer to GDV.