Note: Please see our more recent post about the new Datasets command-line clients and the documentation on how to get orthologs using the newer client. The command-lines below do not work in the current datasets client (NCBI Datasets CLIv14).
You can now get gene ortholog data using the NCBI Datasets command-line tool using a gene ID, gene symbol, or RefSeq nucleotide or protein accession. Data are available for vertebrates and insects. The vertebrate orthologs includes a specialized set for fish. (See our recent post for more information on the orthologs for fish and insects.)
You can retrieve metadata for gene orthologs in JSON Format, or you can download a compressed (zip) archive containing both metadata and sequences (Figure 1).
Figure 1. Command-lines that use a gene symbol (BRCA1) to retrieve mammalian ortholog metadata (top, JSON metadata shown in part in the image) and sequences (bottom).
For example, if you want the mammalian orthologs of the human BRCA1 gene you can use the following summary command to get metadata for these genes:
datasets summary ortholog symbol BRCA1 --taxon human --taxon-filter mammals > brca1-mammals.json
If you want the sequences, use the datasets download command to download a zip archive that includes gene, transcript, and protein sequences as well as metadata in tabular and JSON lines formats:
datasets download ortholog symbol BRCA1 --taxon human --taxon-filter mammals --filename brca1-sequences.zip
See our help documentation, for more information on using the datasets command-line tool to access ortholog data.