Legacy pages now redirect
Effective July 10, 2023, NCBI’s Assembly and Genome record pages now redirect to new NCBI Datasets pages. As previously announced, these updates are part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.
The following pages have been updated:
- The NCBI Assembly record pages now redirect to the new NCBI Datasets Genome record pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST.
- The NCBI Genome record pages now redirect to the NCBI Datasets Taxonomy record pages that provide a taxonomy-focused portal to genes, genomes, and additional NCBI resources.
During this transition, you will have the option to return to the legacy Genome and Assembly record pages. We will remove the legacy pages in early 2024. Continue reading “New & Improved NCBI Datasets Genome and Assembly Pages “
As previously announced, NCBI’s Assembly and Genome record pages will be redirected to new NCBI Datasets pages in June 2023. The NCBI Datasets Command Line Interface (CLI) tools provide easy, straightforward programmatic downloads of assembled genome sequence data. We invite you to check them out and let us know what you think!
Features & Benefits of NCBI Datasets
- Get assembled genome sequence, annotation, and metadata, including transcripts and proteins, in one easy step.
- Querying is easy and flexible! Retrieve data using organism name, assembly accession, or BioProject accession.
- Request data for multiple assemblies in one request – it is now simpler and faster to download large amounts of data.
- Metadata is derived from multiple databases and metadata schemas are documented.
Continue reading “Download Assembled Genome Data Programmatically with NCBI Datasets”
Legacy pages will be redirected effective July 2023
In July 2023, NCBI’s Assembly and Genome record pages will be redirected to new Datasets pages as part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.
We will update the following pages:
- The NCBI Assembly pages will be redirected to the new Datasets Genome pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST.
- The NCBI Genome pages will be redirected to the Datasets Taxonomy pages that provide a taxonomy-focused portal to genes, genomes and additional NCBI resources.
- During this transition, you will have the option to return to the legacy Genome and Assembly pages.
Continue reading “New & Improved NCBI Datasets Genome and Assembly Pages”
As part of an ongoing effort to modernize and improve your experience, NLM’s NCBI Datasets is introducing all-new genome pages. These pages make it easier for you to browse and download genome sequence and metadata, and navigate to tools such as the Genome Data Viewer (GDV) and BLAST.
To get started, search NCBI Datasets by assembly accession (e.g., GCF_016699485.2), assembly name (e.g., bGalGal1.mat.broiler.GRCg7b), WGS accession (e.g., JAENSK01), or species name + genome (e.g., chicken genome), and click on the title in the box. See the top red arrow in Figure 1 below where we search for ‘chicken genome’.
Figure 1: Finding the chicken reference assembly. A search for ‘chicken genome’ returns a box that provides a quick link to the new genome page (middle red arrow). From there, the download button (bottom red arrow) allows you to select the files you need (see ‘Download Package’ window on the left) along with a detailed metadata report that includes all the metadata on the web page. Continue reading “Introducing NLM’s new NCBI Datasets genome page!”
Data curation plays a critical role in today’s biomedical research and ensures scientific data will be accessible for future research and reuse. In the time of pandemics, the need to get scientific information to researchers, medical personnel, and the public as quickly as possible is greater than ever before. In response to the need for increased curation speed, scale, and reliability, computer automation/assistance is becoming increasingly desirable.
The National Library of Medicine (NLM) is pleased to announce a rescheduled three-day Curation at Scale (Virtual) Workshop, to be held on March 28-30, 2022.
Poster abstract submission, deadline: March 7, 2022
Registration, deadline: March 21, 2022
The NLM workshop will feature invited speakers, bring together biocurators, developers of automated curation methods, and other stakeholders, and will offer an opportunity to learn more about the current status of biomedical data curation, to share your research and your challenges, and to discuss the implementation of advanced computational techniques in scientific data curation. We invite participants from academia, government, publishers, and industry interested in the methods and tools employed in curation of biomedical data to register and attend this exciting workshop. Participants are encouraged to submit an abstract for consideration for poster presentation.
If you’re curious about genome annotation beyond the genes, then read on! We previously blogged about our RefSeq Functional Elements resource, which provides annotation of experimentally validated, non-genic functional elements in human and mouse. Now, to kick off 2022, we’re delighted to announce a new publication in the January issue of Genome Research:
Farrell CM, Goldfarb T, Rangwala SH, Astashyn A, Ermolaeva OD, Hem V, Katz KS, Kodali VK, Ludwig F, Wallin CL, Pruitt KD, Murphy TD. RefSeq Functional Elements as experimentally assayed nongenic reference standards and functional interactions in human and mouse. Genome Res. 2022 Jan;32(1):175-188. doi: 10.1101/gr.275819.121. Epub 2021 Dec 7. PMID: 34876495.
Figure 1. Workflow for production of the RefSeq Functional Elements dataset. Full cylinders represent databases, the half-cylinder represents the indicated data source, and rectangles represent actions. Further details can be found in the publication.
Continue reading “Venturing beyond the genes: New RefSeq Functional Elements publication!”
Come see NCBI in person at the International Plant and Animal Genome (PAG) Conference (PAGXXIX), January 9-12 in San Diego, California. Learn about new ways that we are supporting the data management and analysis needs of scientists working across the tree of life. We’re excited to be back after a year of unprecedented circumstances!
As we described in our NLM Director’s featured blog articles, A Journey to Spur Innovation and Discover and Using Comparative Genomics to Advance Scientific Discoveries, NCBI has recently embarked on the NIH-supported NLM initiative known as the NIH Comparative Genomics Resource (CGR). This initiative will modernize resources and infrastructure in order to promote comparative genomic analyses for all eukaryotic organisms. CGR will connect common data elements for genomic-related content with standard structures and mechanisms that will help you uncover previously unrecognized relationships. It will also provide tools that promote the quality of genomic-related data in sequence archives.
When you are at PAG, please check out our NCBI workshops and other sessions, swing by our booth, and visit our posters to learn more about ongoing CGR-related developments and additional NCBI resources related to your genomic research. We especially invite you to join our CGR Listening Session where you can offer valuable input on how NCBI can best provide a resource to support your analyses.
As PAG nears, stay tuned for more details and upcoming announcements from NCBI!
Are you wondering about the quality of a human, mouse or rat genome that you have assembled?
We offer a new service for evaluating the completeness, correctness, and base accuracy of your human, mouse or rat genome assembly compared to a reference assembly. You simply provide NCBI with one or more assemblies in FASTA format and we will do an annotation-based evaluation of the genome(s) using the expert-curated, high-confidence RefSeq transcripts for the species.
Continue reading “A new service to evaluate the quality of your assembled genome!”
The new reference assembly for sheep is now annotated! Assembly ARS-UI_Ramb_v2.0 is made of 142 scaffolds, a drop from 2,640 in the 2017 assembly Oar_rambouillet_v1.0. With a contig N50 of 43 Mb, ARS-UI_Ramb_v2.0 is 15 times more contiguous than the first assembly of the Rambouillet breed.
Annotation Release 104 (AR 104) of ARS-UI_Ramb_v2.0 reflects these improvements. Nearly 200 more coding genes have a 1:1 ortholog in the human genome than in the annotation of Oar_rambouillet_v1.0 (AR 103). The number of coding models annotated as partial is down 35% from 165 to 107, and the number of coding models labeled low quality due to suspected indels or base substitutions in the underlying genomic sequence decreased by 51% (1646 to 796). Based on BUSCO analysis, 99.1% of the models (cetartiodactyla_odb10) are complete in AR 104 versus 98.8% in AR 103. Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can be found here. Continue reading “Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!”
The updated NCBI Datasets Genomes page now has genome data for all domains of life, including bacterial and viral genomes.
The genomes table (Figure 1) now offers filters for:
- Reference genomes — switch it on to only show reference or representative genomes
- Annotated — switch it on to only show annotated genomes
- Assembly level — use the assembly level slider to select higher-quality genomes
- Year released — use the slider to limit your results to recent genomes
In addition, the new Actions column connects you to NCBI’s Genome Data Viewer, BLAST, and Assembly. The Text filter box lets you search by the name of the assembly, species/infraspecies, or submitter.Figure 1. The new Datasets Genomes page with primate assemblies showing the STATUS switches (reference genomes, annotated); expanded filters section with ASSEMBLY LEVEL and YEAR RELEASED sliding selectors; and the Actions column menu with access to Assembly details, BLAST, the Genome Data Viewer, and Download options. Continue reading “Introducing the new NCBI Datasets Genomes page”