Data for SARS-CoV-2 variants now available at NCBI

Looking for genomes for the B.1.1.7 SARS-CoV-2 variant? NCBI now supports searches for SARS-CoV-2 variant names such as B.1.1.7, B.1.351, or P.1. For example, search for B.1.1.7 (Figure 1) and you’ll see a virus classification box with an option to download a SARS-CoV-2 data package. SARS-CoV-2 data packages include genome and protein sequences and a detailed data report for all SARS-CoV-2 genomes classified as that variant. SARS-CoV-2 genome lineages are classified by pangolin, using the pangoLEARN algorithm.

Figure 1. SARS-CoV-2 variant search result with button to download a data package containing data for all SARS-CoV-2 genomes matching that variant lineage, B.1.1.7 in this case.

Need command-line access? We have added a new lineage flag, –lineage, to the datasets command-line tool that allows you to get SARS-CoV-2 variant genomes (Figure 2). Simply type the name of the variant following the lineage flag to request a SARS-CoV-2 data package specific to that variant.

Try our dataformat tool to generate a customizable table of metadata.

For example, use datasets to download a SARS-CoV-2 data package for the P.1 lineage, then use dataformat to generate a table with columns you specify:

$ datasets download virus genome taxon SARS-CoV-2 --lineage P.1 --filename
Downloading:    20.7MB done
$ dataformat tsv virus-genome --package --fields accession,virus-pangolin,release-date,isolate-lineage | head -n3
Accession   Virus Pangolin Classification   Release date    Isolate Lineage
MW909170.1  P.1 2021-04-12  SARS-CoV-2/human/USA/FL-CDC-LC0030369/2021
MW908919.1  P.1 2021-04-12  SARS-CoV-2/human/USA/MA-CDC-LC0029201/2021

Figure 2. Example command-lines using the new flag, –lineage, in the datasets command-line to request a data package and then using dataformat to generate a customizable table

We want your feedback on Datasets! Please email us at


Leave a Reply