The RefSeq eukaryotic genome annotation pipeline (EGAP) is moving to a new annotation naming format that can be used to unambiguously reference both the genome assembly and the RefSeq annotation. This will improve clarity when reporting the data you use and make the data more FAIR (Findable, Accessible, Interoperable, and Reusable). The new naming convention applies to all eukaryotic annotations released after December 15, 2022.
Historically, RefSeq EGAP has used an integer to identify a particular annotation release, such as Homo sapiens Annotation Release 110. This method provides no information on the assembly used for the annotation. In the new RefSeq naming system, annotation releases are designated by a combination of the assembly identifier (e.g., GCF_000001405.40) and an annotation name (e.g., RS_2022_04). The annotation name consists of an RS prefix to indicate RefSeq annotation, and the year and month that it was generated, RS_YYYY_MM. You should always use the annotation name in combination with the corresponding assembly accession.version, for example, GCF_026419915.1-RS_2022_12 (as shown in Figure 1). This ensures that you’re always using the name that defines a specific annotation for a specific genome assembly. If you use only part of the name, it will be ambiguous.
Figure 1. The annotation section of the Datasets Genome page for the assembly bHarHar1 for the harpy eagle (Harpia harpyja) showing the new annotation release GCF_026419915.1-RS_2022_12. Continue reading “Announcing New Names for Eukaryotic Genome Annotations in RefSeq!”
The potential impact of emerging model organisms on human health
Comparative genomics is a science that compares genomic data either within a species or across species to answer questions in biomedicine. Laboratory experiments can then investigate the functional impact of those genomics similarities and differences. The history of comparative genomics goes back to the mid-1990s, but comparative genomics is now accelerating. A flood of new data is emerging as DNA sequencing technology becomes cheaper and commoditized. While this growth poses many challenges to current tools and approaches, it also offers immense opportunity for scientific research and understanding. These insights continue to reveal novel model organisms that can further the impact of comparative genomics on human health. Continue reading “NIH Comparative Genomics Resource project”
In October and November, the NCBI Eukaryotic Genome Annotation Pipeline released thirty-one new annotations in RefSeq for the following organisms:
- Acanthochromis polyacanthus (spiny chromis)
- Acomys russatus (golden spiny mouse)
- Andrographis paniculata (eudicot)
- Antechinus flavipes (yellow-footed antechinus)
- Apodemus sylvaticus (European woodmouse)
- Apus apus (common swift)
- Arachis duranensis (eudicot)
- Continue reading “New RefSeq Annotations!”
San Diego, January 13-18, 2023
NCBI is looking forward to seeing you in person at the International Plant and Animal Genome Conference (PAG 30), January 13-18, 2023 in San Diego, California.
We’re especially excited to share our recent efforts on the NIH Comparative Genomics Resource (CGR), a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources on biomedical research.
We also want to hear from you! If you’re interested in sharing your feedback on your needs and experiences involving comparative genomics tools to inform CGR, consider joining our Feedback Session.
Check out NCBI’s schedule of activities and events:
Continue reading “Join NCBI at PAG 30”
RefSeq release 215 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of November 7, 2022, and contains 335,372,031 records, including 244,583,657 proteins and sequences from 125,116 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 215”
In August and September, the NCBI Eukaryotic Genome Annotation Pipeline released thirty-eight new annotations in RefSeq for the following organisms:
- Adelges cooleyi (spruce gall adelgid)
- Aethina tumida (small hive beetle)
- Anopheles aquasalis (mosquito)
- Anopheles maculipalpis (mosquito)
- Anthonomus grandis grandis (boll weevil)
- Aphis gossypii (cotton aphid)
- Bactrocera neohumeralis (fly)
- Bombus affinis (bee)
- Bombus huntii (bee)
- Cataglyphis hispanica (ant)
- Cygnus atratus (black swan) (pictured) Continue reading “New annotations in RefSeq!”
RefSeq release 214 is now available online, from the FTP site, and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of September 12, 2022, and contains 328,588,569 records, including 239,609,016 proteins, 47,387,931 RNAs, and sequences from 123,394 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Foreign contamination screening
Introducing the new Foreign Contamination Screen (FCS) tool! If you produce assembled genomes, check out FCS, a tool you can run yourself to improve your genome assemblies and facilitate high-quality data submissions to GenBank. FCS is part of the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms. See our previous blog post to learn how FCS enhances contaminant detection sensitivity. Continue reading “RefSeq release 214 is available!”
In June and July, the NCBI Eukaryotic Genome Annotation Pipeline released twenty-six new annotations in RefSeq for the following organisms:
- Anopheles coluzzii (mosquito)
- Anopheles funestus (African malaria mosquito)
- Astyanax mexicanus (Mexican tetra)
- Athalia rosae (coleseed sawfly)
- Bactrocera dorsalis (oriental fruit fly)
- Brassica napus (rape)
- Brienomyrus brachyistius (bony fish)
- Canis lupus dingo (dingo) (pictured)
- Caretta caretta (Loggerhead turtle)
- Dendroctonus ponderosae (mountain pine beetle)
- Epinephelus fuscoguttatus (brown-marbled grouper)
- Lagopus muta (rock ptarmigan)
- Marmota marmota marmota (Alpine marmot)
- Nematostella vectensis (starlet sea anemone)
- Ostrea edulis (bivalve)
- Panthera uncia (snow leopard)
- Plutella xylostella (diamondback moth)
- Pyrus x bretschneideri (Chinese white pear)
- Rhincodon typus (whale shark)
- Rhipicephalus sanguineus (brown dog tick)
- Solanum stenotomum (eudicot)
- Solanum verrucosum (eudicot)
- Sphaerodactylus townsendi (lizard)
- Stegostoma fasciatum (shark)
- Triticum urartu (monocot)
- Ziziphus jujuba (common jujube)
Continue reading “New annotations in RefSeq”
RefSeq release 213 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of July 11, 2022, and contains 321,282,996 records, including 234,520,053 proteins, 45,781,716 RNAs, and sequences from 121,461 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq release 213”