Nov 3 Webinar: dbGaP submission improvements and GaPTools

Nov 3 Webinar: dbGaP submission improvements and GaPTools

Attention dbGaP submitters! Join us on November 3, 2021 at 12PM US eastern time to learn about data submission and processing improvements to dbGaP, NIH’s database of Genotype and Phenotype, which contains individual-level data associated with human research studies. You will see how we have made submission easier through the Submission Portal using automated preliminary validation and how you can use GaPTools, a stand-alone data validation tool, on your own submission to expedite the submission process. Join us to discover how dbGaP ensures integrity and high-quality in the genomic data that scientists can access to further their research.

    • Date: Wed, November 3, 2021
    • Time: 12:00 PM – 12:45 PM EDT
    • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

View GEO, SRA, or dbGaP data tracks in NCBI’s Genome Data Viewer

Did you know that you can see epigenomic or other experimental data in NCBI’s Genome Data Viewer (GDV)?

You can easily add aligned study results from GEO, SRA, and dbGaP as data tracks to GDV browser view. Just go to the Tracks button on the toolbar and select the menu option to Configure Tracks. Navigate to the ‘Find Tracks’ tab on the pop-up Configure panel (Figure 1).

screenshot of genome data browser, showing 'Tracks' menu and 'Find Tracks' tab
Figure 1. Go to the ‘Tracks’ menu on the browser toolbar and select ‘Configure Tracks’ option. This will launch a panel where you can add, configure, remove, and search for data tracks. Go to the ‘Find Tracks’ tab to search for tracks to add to your browser view. Note: spaces act as AND operators in the search, and wildcards are accepted.

Continue reading “View GEO, SRA, or dbGaP data tracks in NCBI’s Genome Data Viewer”

New NCBI Gene Ensembl Comparison Expansion

NCBI Gene has added Ensembl Rapid Releases to the calculation of matching annotations between NCBI RefSeq and Ensembl. This has resulted in the inclusion of over 60 additional assemblies for a total of 241 organisms represented in the set. Matches are made based on transcript and CDS comparisons, and Ensembl gene, transcript, and protein identifiers for annotations similar to the NCBI RefSeq annotations are reported in NCBI Gene and in the gene2ensembl file on the Gene FTP site. The Ensembl annotation is also available in the graphical view and in NCBI’s Genome Data Viewer to give you a side-by-side view of how the annotations compare. Check out blue whale E2F1 for an example.

Figure 1. Balaenoptera musculus E2F transcription factor 1 in Genome Data Viewer

Bulk Sequence-Cytogenetic Conversion Service to be retired in April 2022

The Bulk Sequence-Cytogenetic Conversion Service tool at NCBI will be retired in April 2022. This tool obtained cytogenetic locations for a list of annotated genes, SNPs, or assembly coordinates from human, fruit fly, mouse, or rat genomes. It also obtained sequence coordinates for cytogenetic locations for these genomes. This web service will be retired due to low usage and obsolescence.

The underlying cgi (bp2band) will be retained and continues to drive the Ideogram service within the Genome Data Viewer (GDV) and the Genome Decoration Page. Researchers interested in understanding where features are located relative to chromosome cytogenetic banding should check out the Genome Decoration Page, where you can enter a file of genome annotations and display them on a ideogram of your assembly of interest. You can also go directly to a cytogenetic location on a genome using the search box in the GDV genome browser.

Feel free to contact us with any questions or concerns at info@ncbi.nlm.nih.gov.

The Sequence Read Archive slims down your data with SRA Lite

In response to your requests for compact and faster-to-deliver data, NIH’s Sequence Read Archive (SRA) now offers a new data format – SRA Lite (Figure 1).  SRA Lite supports reliable and faster data transfer, downloads, and analysis using current tools. SRA Lite replaces the submitted base quality score (BQS) with a simplified read quality score, reducing the average read size by ~60% for more efficient analysis and storage of large datasets. This format was designed to reflect improvements in next-generation sequencing that include increases in average read length and sequence coverage. Indeed, the data has improved enough that that removing some quality scores increase genotype accuracy (PMCID: PMC4439189).

Figure 1. FASTQ dumped from SRA Lite format and the SRA configuration dialog. The FASTQ has the quality score for each base set to 30 (‘?’ in the ASCII encoding).  Select “Prefer SRA Lite files with simplified base Quality scores” in the SRA configuration dialog to use SRA Lite. Continue reading “The Sequence Read Archive slims down your data with SRA Lite”

ClinVar annotations now available in NCBI Genome Browsers

Do you need to know which of the many NCBI dbSNP variants annotated near your region of interest are likely to be functionally or clinically significant? Figure it out with the track labelled  ‘ClinVar variants with precise endpoints’, available on sequence display viewers at NCBI, including the Genome Data Viewer (GDV) and Variation Viewer!

This track shows variation annotation, including single nucleotide variants and other short variants (e.g. insertions, deletions, etc.) in the NCBI ClinVar database and provides pathogenicity and other metadata. The ClinVar track is displayed next to the default NCBI and Ensembl gene annotation tracks and other NCBI-provided dbSNP and RNA-seq expression tracks.

screenshot of Genome Data Viewer with 'ClinVar variants' track displayed
Figure 1. GDV showing ‘ClinVar variants with precise endpoints’ track next to NCBI human gene annotation. Tracks are color coded for quick and easy interpretation. Legend is also provided.

Continue reading “ClinVar annotations now available in NCBI Genome Browsers”

Oct 20 Webinar: Introducing the updated PubMed E-utilities (API)

Oct 20 Webinar: Introducing the updated PubMed E-utilities (API)

Join us on October 20, 2021 at 12PM US eastern time learn about an updated version of the E-utilities API for PubMed that we will launch on April 4, 2022. With few exceptions, this update will not change the E-utility URLs you currently use but will bring the search results up to date with the web version of PubMed released in 2020 and improve reliability. Attend this webinar to learn about how these changes will affect your API calls to PubMed and to get your questions answered.

  • Date and time: Wed, October 20, 2021 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Updated PubMed E-utilities coming in April 2022!

Updated PubMed E-utilities coming in April 2022!

Do you develop an application that uses the PubMed API? Do you need to access PubMed data programmatically? Then you’ll be interested to know that we will be launching an updated version of the E-utilities API for PubMed on April 4, 2022. This updated version will align the functions of the E-utilities with the web version of PubMed released in 2020. For example, search results returned by the updated ESearch E-utility will now match those of web PubMed. To be clear, this update only affects E-utility calls with &db=pubmed. The behavior of all other Entrez databases will not change.

Why are we doing this?

This release will fully transfer all E-utility functions to the technology stack that supports web PubMed. What this means for you is not only consistent behavior for both web and API PubMed interfaces, but also more reliable performance.

Will URLs for PubMed E-utility calls be changing?

Fortunately, for the most part, no! With only a few exceptions, current E-utility URLs for PubMed (&db=pubmed) will continue to function after we release the update. Here are the exceptions:

  • ESearch will only be able to access the first 10,000 records retrieved by the search query (&retmax <= 10,000; &retstart + &retmax <= 10,000)
  • EPost will only be able to accept up to 10,000 PMIDs in a single URL request.
  • EFetch will no longer support returns in ASN.1 format.

Will the output of PubMed E-utility calls be changing?

Again, in almost all cases, no. Here are the exceptions:

  • ESearch will now return exactly the same PubMed IDs (PMIDs) that are currently returned by web PubMed
  • EFetch will now return XML data by default (&retmode is not set) rather than ASN.1. In other words, the default value of &retmode will become “xml”.

What should you do?

  • If you manage code that creates PubMed E-utility requests, review the above changes to ensure that your code will continue to function after the update.
  • Verify your code on a test server that we will make public later this fall. We’ll update this blog about the details when they become available.
  • Attend our webinar about these changes on October 20 if you have questions or concerns.

What will happen to the current version of the PubMed E-utilities after the release on April 4, 2022?

Once we release the updated PubMed E-utilities, the current version of the PubMed E-utilities will no longer be available. All PubMed requests will use the same technology stack.

Please write to us at info@ncbi.nlm.nih.gov if you have any questions or concerns.

A new service to evaluate the quality of your assembled genome!

A new service to evaluate the quality of your assembled genome!

Are you wondering about the quality of a human, mouse or rat genome that you have assembled?

We offer a new service for evaluating the completeness, correctness, and base accuracy of your human, mouse or rat genome assembly compared to a reference assembly. You simply provide NCBI with one or more assemblies in FASTA format and we will do an annotation-based evaluation of the genome(s) using the expert-curated, high-confidence RefSeq transcripts for the species.

Continue reading “A new service to evaluate the quality of your assembled genome!”