New taxonomy files available with lineage, type, and host information


NCBI is now producing a new set of taxonomy files that include the taxonomic lineage of taxa, information on type strains and material, and host information. These files are particularly helpful for people maintaining local installations of NCBI data.

You can download the new archive (new_taxdump.tar.gz) from the taxonomy directory on the FTP site (ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/). The new files are typematerial.dmp, typeoftype.dmp, rankedlineage.dmp, fullnamelineage.dmp,
taxidlineage.dmp, and host.dmp. Please see the readme file for details of the file contents.

The original taxonomy file archive without the new content will remain available under its original name, taxdump.tar.gz. The section below shows the entries for the monkey species Cercopithecus lomamiensis from the new ranked lineage and  type material files.


1191211	|	Cercopithecus lomamiensis	|		|	Cercopithecus	|	Cercopithecidae	|	Primates	|	Mammalia	|	Chordata	|	Metazoa	|	Eukaryota	|

1191211	|	Cercopithecus lomamiensis	|	holotype	|	YPM 14080	|
1191211	|	Cercopithecus lomamiensis	|	holotype	|	YPM MAM 14080	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM 14189	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM 14191	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM 14192	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM MAM 14189	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM MAM 14191	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM MAM 14192	|

Top panel: Ranked lineage for Cercopithecus lomamiensis from rankedlineage.dmp. The ranks are species, genus, family, order, class, phylum, kingdom, and superkingdom. Bottom panel: Type material information from typematerial.dmp. The columns are taxonomy id, name, type designation, collection/repository details.

Tree

 

7 thoughts on “New taxonomy files available with lineage, type, and host information

  1. Pingback: New taxonomy files available with lineage, type, and host information – Science

  2. Pingback: Weekly Postings | The MARquee

    • Dear Dr. Heiler,
      The new files contain the same information as the originals, just in a different format. If the original formats work for you, there is no reason to adjust your process. All formats will continue to be updated daily.
      Please let me know if you have any other questions.

      Best regards,

      Stacy Ciufo
      Taxonomy Data Support Specialist
      Contractor

  3. There seem to be some missing taxa in the rankedlineage file from the new taxdump.

    For instance, the only taxa in the cyanobacteria phylum appear to be a couple gloeobacter, whereas there are quiet a few other diverse cyanobacteria with complete assemblies on NCBI. Is this an oversight? Or are they omitted for particular reason?

  4. Dear Dr. Cooley,
    Thank you for your comment on the rankedlineage file.
    I just checked the file, and see that there are 23,031 entries for Cyanobacteria.
    If you are not seeing this, can you let me know how you are parsing the file?

    Best regards,

    Stacy Ciufo
    Taxonomy Data Support Specialist
    Contractor

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s