Author: NCBI Staff

New and improved SciENcv experience starting January 2023!

New and improved SciENcv experience starting January 2023!

Science Experts Network Curriculum Vitae (SciENcv) is an electronic system that helps you assemble professional information needed to apply for federal grant applications. Starting January 2023, we will be introducing a new and improved SciENcv experience!

SciENcv helps you gather and compile information on expertise, employment, education, and professional accomplishments. You can use SciENcv to create and maintain financial documents and biosketches that are submitted with grant applications.

Why should I use SciENcv?

  • Eliminates the need to repeatedly enter biosketch and financial document information
  • Reduces the administrative burden associated with federal grant submission and reporting requirements
  • Allows you to describe your scientific contributions in your own words

Continue reading “New and improved SciENcv experience starting January 2023!”

Submit your data to dbGaP in 3 easy steps!

Submit your data to dbGaP in 3 easy steps!

Do you have human genetic data from a large-scale study? Submit your data to NCBI’s Database of Genotypes and Phenotypes (dbGaP) to contribute to meaningful discoveries about health. dbGaP contains data from more than 2.8 million study participants who have provided over 3.3 million molecular samples.

How do I submit data to dbGaP?

Step 1: Register your study

Step 2: Submit your data and get your study accession (phs#)

Step 3: Release your data

Continue reading “Submit your data to dbGaP in 3 easy steps!”

CCDS Release 24

CCDS Release 24

An updated dataset of human protein-coding regions from the Consensus Coding Sequence (CCDS) collaboration

Are you interested in a set of high-quality human coding regions (CDS) with equivalent annotation in NCBI’s RefSeq and EMBL-EBI’s (European Molecular Biology Laboratories-European Bioinformatics Institute) Ensembl annotations? Check out the new CCDS Release 24! This CCDS set was generated by comparing RefSeq Annotation Release 110 and Ensembl Release 108.

This update adds 2,746 new CCDS IDs and 237 new genes compared to the last human CCDS build (Release 22, 2018). CCDS Release 24 includes a total of 35,608 CCDS IDs that correspond to 19,107 genes, with 48,062 protein sequences from RefSeq and 47,762 from Ensembl.

The new CCDS release is available on FTP for bulk download and on the CCDS webpage in case you are looking for data on individual genes. Continue reading “CCDS Release 24”

New annotations in RefSeq!

New annotations in RefSeq!

In August and September, the NCBI Eukaryotic Genome Annotation Pipeline released thirty-eight new annotations in RefSeq for the following organisms:

  • Adelges cooleyi (spruce gall adelgid)
  • Aethina tumida (small hive beetle)
  • Anopheles aquasalis (mosquito)
  • Anopheles maculipalpis (mosquito)
  • Anthonomus grandis grandis (boll weevil)
  • Aphis gossypii (cotton aphid)
  • Bactrocera neohumeralis (fly)
  • Bombus affinis (bee)
  • Bombus huntii (bee)
  • Cataglyphis hispanica (ant)
  • Cygnus atratus (black swan) (pictured) Continue reading “New annotations in RefSeq!”
dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!

dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!

Are you familiar with the well-known Framingham Heart Study, a multi-generation study of residents of Framingham, Massachusetts begun in 1948? Much of what is now known about the impact of genetics, lifestyle, and diet on cardiovascular health and disease has come from this research study. (See PMC4159698  for a historical perspective.) Did you know that data from this study and over 2,000 other studies that demonstrate the relationship between genetic and medical outcomes and other phenotypes are available from NCBI’s Database of Genotypes and Phenotypes (dbGaP)?

dbGaP was established in 2007 as a repository of human data from large scale studies. You can access data from more than 2.8 million study participants who have provided over 3.3 million molecular samples. You can retrieve patient-level phenotypic (e.g., demographic, clinical, exposure) data and molecular (e.g., called genotypes omics, sequence) data, and the results of association analyses from genome-scale case-control and longitudinal studies of heritable diseases.

What types of studies and data are available in dbGaP?

dbGaP contains a wide range of studies and types of data, all relating to human genetic and phenotypic measurements. Most dbGaP data are from NIH-funded research, but recently we have expanded to include non-NIH funded studies. An easy way to find dbGaP Studies, Phenotype and Molecular Datasets, Variables, Analyses and Documents is through the dbGaP Advanced Search (Figure 1). The interface allows you to filter results by different characteristics depending on the tab you choose.

Figure 1. The dbGaP Advanced Search interface. Tabs that appear at the top of the web interface allow you to select the studies, datasets, analyses, etc. of interest. Filters (facets) appear on the left (see inset). Click on filters to select values to find Links on the study summary pages provide direct access to data. Top panel:  Studies tab and the corresponding filter categories.  Bottom panel: Molecular data tab results with Study (Framingham SHARe), Markerset Source (Affymetrix) filters applied. 

Continue reading “dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!”

Announcing GenBank release 252.0

Announcing GenBank release 252.0

Now over 3 billion records!

GenBank release 252.0 (10/17/2022) is now available on the NCBI FTP site. This release has 20.35 trillion bases and 3.10 billion records. The current release has 240,539,282 traditional records containing 1,562,963,366,851 base pairs of sequence data. There are also 2,167,900,306 WGS records containing 18,231,960,808,828 base pairs of sequence data, 574,020,080 bulk-oriented TSA records containing 511,476,787,957 base pairs of sequence data, and 115,123,306 bulk-oriented TLS records containing 43,860,512,749 base pairs of sequence data. 

Continue reading “Announcing GenBank release 252.0”

New version of PGAP now available!

New version of PGAP now available!

We are happy to announce a new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP). This version helps you interpret your results by providing an estimate of the completeness and contamination of your PGAP-annotated genome assembly using CheckM.

CheckM uses the presence of a set of lineage-specific genes for the species provided  or the species returned by the taxonomy check (–taxcheck, –auto-correct-tax). The higher the completeness and the lower the contamination, the better the assembly is! If contamination is a concern, please try FCS-GX, a highly sensitive tool for detecting foreign contaminants in prokaryotic and eukaryotic genome assemblies.

This new release also contains code changes that improve prediction of some long genes, especially in low complexity regions. And, as with every release, PGAP incorporates incremental improvements from expert curators of the Protein Family Model collection that increase the precision of PGAP’s structural and functional annotation.

Please try this new version and share your experience with us!

 

Now Available! Updated NCBI Datasets Command-Line Tools 

Now Available! Updated NCBI Datasets Command-Line Tools 

NLM’s NCBI Datasets announces the release of version 14 of our command-line (CLI) tools, datasets, and dataformat. This release (CLI v14.0.0) contains many improvements that are inspired by your feedback. It’s now easier than ever to browse and format metadata, generate customized tables, and download data packages. We hope these updates will improve your experience! 

NCBI Datasets CLIv14 includes changes to the command syntax, data package contents, and data report schemas that are not backwards-compatible. Commands written for CLI versions prior to version 14 may fail after the latest update. For more details see our FAQs.   Continue reading “Now Available! Updated NCBI Datasets Command-Line Tools “

Now available: Updated prokaryote representative genomes collection

Now available: Updated prokaryote representative genomes collection

An updated bacterial and archaeal representative genomes collection is available! We selected a total of 16,665 of the 262,000 prokaryotic assemblies in RefSeq to represent their respective species. For the first time, more complete assemblies (as calculated by CheckM) were ranked higher than less complete assemblies. See the ranked list of criteria for selecting representative assemblies here. Continue reading “Now available: Updated prokaryote representative genomes collection”

New! NIH Genetic Testing Registry (GTR) API

New! NIH Genetic Testing Registry (GTR) API

Want to automate submitting genetic test-related information to the NIH Genetic Testing Registry? Now you can! In September 2022, GTR released a submission API that supports fully automated submission of test data to GTR. The new API is one more way, in addition to the Submission Portal wizard and bulk submission using a spreadsheet template, to submit test data to GTR.

Why an API?

An API will allow you to programmatically generate and deposit your latest information into GTR, especially for a large volume of genetic tests. Our customers rely on your up-to-date information to make accurate decisions for their patients. The API creates a one-time setup, multiple-time reuse pathway for timely updates.

How to get started

To start the new submission process:

  1. If you haven’t already, register your lab with GTR
  2. Request an API service account from the GTR staff
  3. Once we’ve established your service account, create an API key

Continue reading “New! NIH Genetic Testing Registry (GTR) API”