Tag: Datasets

Coming soon! Changes to NCBI Datasets command-line tool in version 14 (CLIv14.0.0)

Coming soon! Changes to NCBI Datasets command-line tool in version 14 (CLIv14.0.0)

In October 2022, NCBI Datasets will release version 14 of our datasets and dataformat command-line tools. This release will contain breaking changes to the command syntax, content of the data packages and data reports. Thank you for your feedback that inspired these new features. We hope they will improve your experience!

We will continue to support CLI v13.x, although new features and improvements will be exclusive to CLI v14.0.0 release and up.

NCBI Datasets supports the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms. Join our mailing list to keep up to date with NCBI Datasets and other CGR news.

More details

How is version 14 of the Datasets command-line tools (CLI v14.x) different from CLI v13.x and previous versions?  Continue reading “Coming soon! Changes to NCBI Datasets command-line tool in version 14 (CLIv14.0.0)”

Join NCBI virtually at the Biodiversity Genomics 2022 conference

Join NCBI virtually at the Biodiversity Genomics 2022 conference

Learn about the NIH Comparative Genomics Resource (CGR) Project

The Biodiversity Genomics conference will take place virtually, October 2-7, 2022. This event is hosted by the Earth BioGenome Project and is open and free for all to attend.

NCBI staff will present a variety of recorded talks and posters highlighting various elements of the NIH Comparative Genomics Resource (CGR), including NCBI Datasets and the Comparative Genome Viewer (CGV). CGR is a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources to biomedical research. NCBI is charged with leading CGR development and engaging genomics communities. The CGR project will facilitate reliable comparative genomics analyses for all eukaryotic organisms in collaboration with the genomics community.

Check out NCBI’s schedule of activities to learn more about CGR: Continue reading “Join NCBI virtually at the Biodiversity Genomics 2022 conference”

New annotations in RefSeq

New annotations in RefSeq

In June and July, the NCBI Eukaryotic Genome Annotation Pipeline released twenty-six new annotations in RefSeq for the following organisms:

  • Anopheles coluzzii (mosquito)
  • Anopheles funestus (African malaria mosquito)
  • Astyanax mexicanus (Mexican tetra)
  • Athalia rosae (coleseed sawfly)
  • Bactrocera dorsalis (oriental fruit fly)
  • Brassica napus (rape)
  • Brienomyrus brachyistius (bony fish)
  • Canis lupus dingo (dingo) (pictured)
  • Caretta caretta (Loggerhead turtle)
  • Dendroctonus ponderosae (mountain pine beetle)
  • Epinephelus fuscoguttatus (brown-marbled grouper)
  • Lagopus muta (rock ptarmigan)
  • Marmota marmota marmota (Alpine marmot)
  • Nematostella vectensis (starlet sea anemone)
  • Ostrea edulis (bivalve)
  • Panthera uncia (snow leopard)
  • Plutella xylostella (diamondback moth)
  • Pyrus x bretschneideri (Chinese white pear)
  • Rhincodon typus (whale shark)
  • Rhipicephalus sanguineus (brown dog tick)
  • Solanum stenotomum (eudicot)
  • Solanum verrucosum (eudicot)
  • Sphaerodactylus townsendi (lizard)
  • Stegostoma fasciatum (shark)
  • Triticum urartu (monocot)
  • Ziziphus jujuba (common jujube)

Continue reading “New annotations in RefSeq”

Announcing the NCBI Datasets SARS-CoV-2 taxonomy page

Announcing the NCBI Datasets SARS-CoV-2 taxonomy page

Need SARS-CoV-2 assembled genome sequences or specific SARS-CoV-2 protein sequences? You can find them on the new SARS-CoV-2 taxonomy page brought to you by NCBI Datasets.

The NCBI Datasets SARS-CoV-2 taxonomy page brings you both SARS-CoV-2 genomes and proteins, basic information about SARS-CoV-2, and connections to related NCBI pages, all in one place (see Figures 1 and 2).

Figure 1. NCBI Datasets SARS-CoV-2 taxonomy page. For command-line access, try the datasets command-line tool (top box). For customized filtering options, check out NCBI Virus (bottom box).

If you scroll down the taxonomy page you will find a table of SARS-CoV-2 proteins, each with “Actions” that provide the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (Figure 2), as well as links to NCBI Gene and the protein sequence from the reference genome.

Figure 2. NCBI Datasets SARS-CoV-2 taxonomy page (cont’d). Click the blue download button to download a package of all SARS-CoV-2 genomes (6 M and counting as of 7/15/22), or just the SARS-CoV-2 reference genome (top box). Below that you see a table of SARS-CoV-2 proteins, each with “Actions” available through the three-dot menu that provides the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (bottom boxes).

We want to hear from you! Check out the new SARS-CoV-2 taxonomy page and let us know what you think. Contact us with questions or feedback.

Join our mailing list to keep up to date with Datasets and other NCBI news.

NLM’s all-new NCBI Datasets genome table is now available

NLM’s all-new NCBI Datasets genome table is now available

We are excited to introduce new and useful updates to the Datasets genome table that let you quickly find and download a genome dataset including genome, transcript and protein sequence, annotation, and a data report.

The new genome table includes many new features and benefits (see Figure 1). With the new genome table you can:

  • Find all current genomes, including metagenomes
  • View multiple taxa such as birds and bees, or polyphyletic groups like fish
  • Easily find genomes with NCBI RefSeq annotations
  • Get more accurate genome counts, since each row now represents a single genome with GenBank and RefSeq accessions for that genome in the same row
  • Customize your downloads to include either GenBank or RefSeq files, or both
  • Download tables or data packages

Continue reading “NLM’s all-new NCBI Datasets genome table is now available”

NCBI at 2022 Galaxy Community Conference

NCBI at 2022 Galaxy Community Conference

Join NCBI’s Nuala O’Leary, PhD at the 2022 Galaxy Community Conference (GCC2022), July 17-23 in Minneapolis, Minnesota, to learn more about Datasets, a new resource that makes it easier to access NCBI sequence data.

GCC2022 brings together hundreds of researchers, trainers, tool developers, software engineers, and computational infrastructure providers, all addressing common challenges in data intensive science using the Galaxy data integration and analysis platform. This will be an in-person meeting with limited support for remote attendees.

Check out NCBI’s schedule of activities: 

NCBI Datasets, a new resource for accessing NCBI genome data in Galaxy
Poster: Monday, July 18, 10:20 AM CDT

Talk: Tuesday, July 19, 4:59 PM CDT

Demo: Wednesday, July 20, 10:20 AM CDT

Continue reading “NCBI at 2022 Galaxy Community Conference”

Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!

Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!

Join NLM’s NCBI at the virtual CollaborationFest on July 15 from 08:00 – 11:00 CDT and 12:00 – 16:00 CDT following the BOSC 2022 conference. Get an in-depth orientation and opportunity to test the capabilities of Datasets and ElasticBLAST.

What is Datasets?

Datasets is a new resource that lets you easily gather data from across NCBI databases. Find and download gene, transcript, protein and genome sequences, annotation and metadata. We invite you to try the Datasets command line tool in your bioinformatic workflows! Continue reading “Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!”

Introducing NLM’s new NCBI Datasets genome page!

Introducing NLM’s new NCBI Datasets genome page!

As part of an ongoing effort to modernize and improve your experience, NLM’s NCBI Datasets is introducing all-new genome pages. These pages make it easier for you to browse and download genome sequence and metadata, and navigate to tools such as the Genome Data Viewer (GDV) and BLAST.

To get started, search NCBI Datasets by assembly accession (e.g., GCF_016699485.2), assembly name (e.g., bGalGal1.mat.broiler.GRCg7b), WGS accession (e.g., JAENSK01), or species name + genome (e.g., chicken genome), and click on the title in the box. See the top red arrow in Figure 1 below where we search for ‘chicken genome’.

Figure 1: Finding the chicken reference assembly. A search for ‘chicken genome’ returns a box that provides a quick link to the new genome page (middle red arrow). From there, the download button (bottom red arrow) allows you to select the files you need (see ‘Download Package’ window on the left) along with a detailed metadata report that includes all the metadata on the web page.  Continue reading “Introducing NLM’s new NCBI Datasets genome page!”

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Come visit NCBI at the Bioinformatics Open Science Conference (BOSC), part of the Intelligent Systems for Molecular Biology Conference (ISMB), July 13-16, taking place both in person in Madison, Wisconsin and virtually! We’ll be presenting talks and posters on the latest updates to the NCBI Datasets, BLAST, and Protein resources. You can also join us at the Birds of a Feather (BoF) discussion and the BOSC CollaborationFest (CoFest) to explore these resources and discuss workflows with NCBI staff. Continue reading “Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022”

Gapless Telomere to Telomere human genome (T2T-CHM13) now available

Gapless Telomere to Telomere human genome (T2T-CHM13) now available

On April 1, 2022, Science published the first complete sequence of a human genome, known as T2T-CHM13. This notable scientific achievement comes two decades after the first human genome release from the Human Genome Project and offers an in situ look at biologically important regions, such as centromeres, telomeres, and segmental duplications, that were previously unassembled. Read on to learn more about how you can access this assembly and related resources at NCBI, or to access any one of the more than 1000 human genome assemblies now in GenBank. Continue reading “Gapless Telomere to Telomere human genome (T2T-CHM13) now available”