As part of an ongoing effort to modernize and improve your experience, NLM’s NCBI Datasets is introducing all-new genome pages. These pages make it easier for you to browse and download genome sequence and metadata, and navigate to tools such as the Genome Data Viewer (GDV) and BLAST.
To get started, search NCBI Datasets by assembly accession (e.g., GCF_016699485.2), assembly name (e.g., bGalGal1.mat.broiler.GRCg7b), WGS accession (e.g., JAENSK01), or species name + genome (e.g., chicken genome), and click on the title in the box. See the top red arrow in Figure 1 below where we search for ‘chicken genome’.
Figure 1: Finding the chicken reference assembly. A search for ‘chicken genome’ returns a box that provides a quick link to the new genome page (middle red arrow). From there, the download button (bottom red arrow) allows you to select the files you need (see ‘Download Package’ window on the left) along with a detailed metadata report that includes all the metadata on the web page. Continue reading “Introducing NLM’s new NCBI Datasets genome page!”→
Introducing the NIH Comparative Genomics Resource (CGR)
NCBI is looking forward to seeing you in person at the International Plant and Animal Genome Conference (PAG XXIX), January 8-12, 2022 in San Diego, California. We’re especially excited to introduce our newest endeavor – the NLM initiative known as the NIH Comparative Genomics Resource (CGR)– a platform we are developing to support comparative analyses of sequenced eukaryotic research organisms. Understanding and supporting the needs of researchers is a fundamental element in the development of CGR and is critical to its future success in supporting a large and diverse collection.
Please join NCBI for the following events to learn more about CGR and how you can inform its development:
Missed a few videos on YouTube? Here’s the latest from our channel.
Customize the MSA Viewer to Make Your Analysis Easier
We’re constantly improving the Multiple Sequence Alignment (MSA) Viewer. This video demonstrates several new and popular features, including the ability to change data columns, hide selected rows, analyze polymorphisms, and more.
The National Center for Biotechnology Information (NCBI) has several speakers at the upcoming Biodiversity Genomics Conference from September 27 to October 1, 2021.
Valerie Schneider, head of NCBI’s SeqPlus Program and Deputy Director for Sequence Offerings, will present a poster discussing how NCBI’s new comparative genome research focus will enable researchers to explore all eukaryotic research organisms, find related organisms and support additional organism-specific resources that a specific community may have or wish to develop.
Nuala O’Leary, Product Owner, NCBI Datasets will present the latest developments for Datasets, a beta resource that supports intuitive and flexible access to genome data for a broad range of taxa via a redesigned website and command-line tools.
Adelaide Rhodes, Cloud Subject Matter Expert in Education, will present two case studies that emphasize the ease of navigating the new Datasets website as well as the use of command line tools to speed up data discovery for genes and genomes of interest.
Terence Murphy, Product Owner, NCBI RefSeq will present a new tool for genome providers to identify contamination in newly assembled sequences with high sensitivity, specificity, and performance.
The Biodiversity Genomics Conference brings together a global audience to celebrate achievements in genome sequencing across the eukaryotic tree of life, explore current challenges and solutions, and to develop strategies for sequencing and data sharing in the upcoming decade of biodiversity genomics. NCBI has several programs that support the needs of this scientific research group.
NCBI Datasets introduces species pages and species browser! The species pages summarize taxon information and provide access to genomic data, including reference genomes. For example, see Figure 1, the Nothobranchius furzeri (turquoise killifish) species page.
Figure 1: Nothobranchius furzeri species page. The browse species button will take you to the species browser.
Join us on September 22, 2021 at 12PM eastern time learn to use the datasets command-line tools (datasets and dataformat) to access, filter, download, and format data and metadata for genomes. Through examples from eukaryotes and the SARS-CoV-2 coronavirus, you will see how to use metadata to filter for genome sequences with desired properties such as genomes with high contig N50 values.
Date and time: Wed, September 22, 2021 12:00 PM – 12:45 PM EDT
The new reference assembly for sheep is now annotated! Assembly ARS-UI_Ramb_v2.0 is made of 142 scaffolds, a drop from 2,640 in the 2017 assembly Oar_rambouillet_v1.0. With a contig N50 of 43 Mb, ARS-UI_Ramb_v2.0 is 15 times more contiguous than the first assembly of the Rambouillet breed.
Annotation Release 104 (AR 104) of ARS-UI_Ramb_v2.0 reflects these improvements. Nearly 200 more coding genes have a 1:1 ortholog in the human genome than in the annotation of Oar_rambouillet_v1.0 (AR 103). The number of coding models annotated as partial is down 35% from 165 to 107, and the number of coding models labeled low quality due to suspected indels or base substitutions in the underlying genomic sequence decreased by 51% (1646 to 796). Based on BUSCO analysis, 99.1% of the models (cetartiodactyla_odb10) are complete in AR 104 versus 98.8% in AR 103. Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can be found here. Continue reading “Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!”→