Category: What’s New

July 11 NCBI Minute: Five Teaching Examples with NCBI APIs

July 11 NCBI Minute: Five Teaching Examples with NCBI APIs

Next Wednesday, July 11, 2018, NCBI staff will show you a set of simple exercises that use EDirect to explore aspects of a human gene. You can easily incorporate these examples into your undergraduate biology courses.

Date and time: Wed, July 11, 2018 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2KmH1yO

Continue reading “July 11 NCBI Minute: Five Teaching Examples with NCBI APIs”

CCDS release 22 for human is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI’s Homo sapiens annotation release 109 to Ensembl’s release 92 is now reflected in Gene. This update adds 894 new CCDS IDs, and adds 154 Genes into the human CCDS set. CCDS release 22 includes a total of 33,397 CCDS IDs that correspond to 19,033 GeneIDs.

The CCDS project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long-term goal is to support convergence towards a standard set of gene annotations.

dbSNP database doubles in size twice in 13 months

dbSNP database doubles in size twice in 13 months

In little over a year, dbSNP human data have doubled in size from 150 million Reference SNP (rs) records to 325 million in Build 150, and again to more than 650 million rs records in Build 151. 580 million of these rs records have frequency data in Build 151.This explosive growth makes dbSNP the world’s largest public human variation database. Current trends suggest that large-scale WGS and WES projects will discover millions of new variations in the next few years.

Build 151 was released in March 2018. The data are available for web search and FTP download.

NCBI’s dbSNP houses variation and frequency data from large-scale projects including 1000Genomes, GO-ESP, ExAC, GnomAD, TOPMED and HLI, as well as focused studies like locus-specific databases (LSDB) and clinical sources. The rs records are annotated on RefSeq genomes, mRNA and protein sequences and integrated with other NCBI resources (e.g., Assembly, Gene, RefSeq, PubMed, and BioProject). The database is used worldwide in personal genomics, medical genetics, and for managing, annotating and analysis of variation data.

June 27 NCBI Minute: dbGaP’s New Ancestry Composition Visualization tool and GRAF Software

June 27 NCBI Minute: dbGaP’s New Ancestry Composition Visualization tool and GRAF Software

Next Wednesday, June 27, 2018, we’ll introduce you to the Genetic Relationship and Fingerprinting (GRAF) software package. GRAF is a quality assurance tool that finds duplicates and closely related subjects in your data using SNP genotypes. We’ll also introduce the GRAF-pop feature, which computes subject ancestries and plots data for export as a .png or .txt file.

Date and time: Wed, June 27 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2LjCaML

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

April and May annotations in RefSeq: cow, bonobo and more

April and May annotations in RefSeq: cow, bonobo and more

In April and May, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Bos taurus (cattle)
  • Cephus cinctus (wheat stem sawfly)
  • Citrus sinensis (sweet orange)
  • Cynara cardunculus cardunculus (eudicot)
  • Cynoglossus semilaevis (tongue sole)
  • Gallus gallus (chicken)
  • Kryptolebias marmoratus (mangrove rivulus)
  • Macaca nemestrina (pig-tailed macaque)
  • Maylandia zebra (zebra mbuna)
  • Medicago truncatula (barrel medic)
  • Pan paniscus (pygmy chimpanzee)
  • Pteropus alecto (black flying fox)
  • Python bivittatus (Burmese python)
  • Ricinus communis (castor bean)
  • Temnothorax curvispinosus (ant)
  • Tetranychus urticae (two-spotted spider mite)
  • Ziziphus jujuba (common jujube)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Improvements made to genomes FTP site

We’ve been making improvements to the contents of NCBI’s genomes FTP site. Highlights include:

  • addition of new file types, including a feature_count.txt file with counts of gene, RNA, and CDS features of specific types and a translated_cds.faa file with conceptual translations of each CDS feature on the genome
  • improvements to the Sequence Ontology feature types used in GFF3, including identification of pseudogene gene features as “pseudogene” instead of “gene” in column 3
  • improvements to the gene_biotype calculation to categorize transcribed pseudogenes as transcribed_pseudogene instead of misc_RNA
  • addition of the #!annotation-source unofficial pragma to GFF3 files with the annotation name, for assemblies where that information is available
  • addition of an FTP directory for GenBank viral genomes that includes International Committee on Taxonomy of Viruses (ICTV) species exemplar virus genomes and a growing number of NCBI viral neighbor genomes
  • expanded the UCSC sequence name mapping provided in the assembly report files to provide mappings between GenBank or RefSeq sequence accessions, chromosome or scaffold names, and the UCSC sequence name for most of the recent assemblies in the UCSC Genome Browser

Continue reading “Improvements made to genomes FTP site”

Summer 2018 NIH Data Hackathon July 23-25, 2018

Summer 2018 NIH Data Hackathon July 23-25, 2018

From July 23rd to 25th, 2018, NCBI will host a data science hackathon on the NIH campus. This hackathon will focus on genomics as well as general Data Science analyses including text, image and sequence processing. This event is for researchers, including students and postdocs, who have already engaged in the use of large datasets or in the development of pipelines for analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.

The event is open to anyone selected for the hackathon and willing to travel to the NIH campus in Bethesda, Maryland.

Continue reading “Summer 2018 NIH Data Hackathon July 23-25, 2018”

Improved annotation of Streptomyces RefSeq genomes

Improved annotation of Streptomyces RefSeq genomes

We’ve completed the RefSeq reannotation of over 1,000 Streptomyces genomes! The genomes were reannotated using the Prokaryotic Genome Annotation Pipeline (PGAP). PGAP detected nearly 100% of ribosomally synthesized and post-translationally modified peptide natural products (RiPP)-encoding genes from known families, despite their small size, using a set of over 30 hidden Markov Models (HMMs) built by RefSeq biocurators. Over 70% (251) of lasso peptides now present in Streptomyces RefSeq genomes (354) were annotated for the first time.

If you are aware of any class of RiPP precursor in Streptomyces that was not found in our recent re-annotation, please contact us through the NCBI Help Desk, and we will add new HMMs to the rules we use to find and annotate RiPP precursor genes.

Important dbSNP updates: New JSON data files, RefSNP report, API

Important dbSNP updates: New JSON data files, RefSNP report, API

dbSNP is moving to the new design with new products ready for testing including new JSON data files, the RefSNP page, and an API.

New JSON data files

Human Build 151 release is the last build that will provide relational database table dumps on the FTP site. Instead, dbSNP data will be available as a cumulative file of RefSNP objects in the JSON format in future build releases. These JSON files are available now for users to begin migration and testing. Tutorials for parsing JSON are on GitHub.

Continue reading “Important dbSNP updates: New JSON data files, RefSNP report, API”

5 new videos on YouTube: Get the most out of BLAST, MedGen, PubChem and more

Here are the latest videos on our YouTube channel. Subscribe to get alerts for new videos.

NCBI Minute: Getting the Most out of Web BLAST Tabular Format

The NCBI web BLAST service has several useful download formats, including tabular formats. All formats allow you to easily save your BLAST results for processing, editing, and annotating.

This video will show you how to use basic Unix tools and EDirect to expand and enhance your tabular saved BLAST results. You will also see learn how to add useful information like taxonomy and sequence titles.

Continue reading “5 new videos on YouTube: Get the most out of BLAST, MedGen, PubChem and more”