Looking for genomes for the B.1.1.7 SARS-CoV-2 variant? NCBI now supports searches for SARS-CoV-2 variant names such as B.1.1.7, B.1.351, or P.1. For example, search for B.1.1.7 (Figure 1) and you’ll see a virus classification box with an option to download a SARS-CoV-2 data package. SARS-CoV-2 data packages include genome and protein sequences and a detailed data report for all SARS-CoV-2 genomes classified as that variant. SARS-CoV-2 genome lineages are classified by pangolin, using the pangoLEARN algorithm.
Figure 1. SARS-CoV-2 variant search result with button to download a data package containing data for all SARS-CoV-2 genomes matching that variant lineage, B.1.1.7 in this case.
Need command-line access? We have added a new lineage flag, –lineage, to the datasets command-line tool that allows you to get SARS-CoV-2 variant genomes (Figure 2). Simply type the name of the variant following the lineage flag to request a SARS-CoV-2 data package specific to that variant.
Try our dataformat tool to generate a customizable table of metadata.
For example, use datasets to download a SARS-CoV-2 data package for the P.1 lineage, then use dataformat to generate a table with columns you specify:
$ datasets download virus genome taxon SARS-CoV-2 --lineage P.1 --filename sars2-p1.zip
Downloading: sars2-p1.zip 20.7MB done
$ dataformat tsv virus-genome --package sars2-p1.zip --fields accession,virus-pangolin,release-date,isolate-lineage | head -n3
Accession Virus Pangolin Classification Release date Isolate Lineage
MW909170.1 P.1 2021-04-12 SARS-CoV-2/human/USA/FL-CDC-LC0030369/2021
MW908919.1 P.1 2021-04-12 SARS-CoV-2/human/USA/MA-CDC-LC0029201/2021
Figure 2. Example command-lines using the new flag, –lineage, in the datasets command-line to request a data package and then using dataformat to generate a customizable table
We want your feedback on Datasets! Please email us at info@ncbi.nlm.nih.gov