Important improvements on the genome Assembly pages

We’ve been making improvements to the NCBI genome Assembly resource. Highlights include:

  • Links added between members of a pair of genome assemblies derived from the same diploid individual
  • Additional filters now shown on the left-hand side bar
    • Annotation status
    • Assembly type, including the new types “Unresolved diploid” and “Alternate pseudohaplotype”
  • vhost filters on the Advanced page Search Builder that allow selection of virus assemblies with a particular host (e.g. “vhost human”)
  • Searching by assembly names with the version unspecified
  • Total ungapped length reported in the “Global statistics” table, replacing the less useful total gap length
  • Improved N50 & L50 statistics presentation for complex genome assemblies

Continue reading

NCBI Computational Virology Workshop at LSU New Orleans!

Introduction to NGS Analysis in a Cloud Environment for Novice Bioinformaticians:

We are pleased to announce a free workshop in New Orleans, Louisiana April 23 and 24.  After a short cloud-onboarding session, early career computational virologists interested in extracting sequences from metagenomic samples will be exposed to new, community-generated tools!

Continue reading

NCBI at the ACMG meeting in Seattle next week (April 2-6, 2019)

In about a week, NCBI staff will join GeneReviews® on their home turf, Seattle, at the Annual Clinical Genetics Meeting hosted by the American College of Medical Genetics and Genomics (ACMG). While there we will have an exhibit booth (#531) where you can meet our staff, get answers to your questions, and pick-up informative handouts on our various resources for clinical practice.

Also, be sure to visit our two posters on Friday, April 5 from 10:30 AM to 12 PM.

Continue reading

The National Library of Medicine seeks a Scientific Director

The National Library of Medicine (NLM) seeks a Scientific Director with creative vision and strong leadership to guide its Intramural Research Program. One of the 27 Institutes and Centers (ICs) of the National Institutes of Health (NIH), NLM is a leader in computational health sciences research and the world’s largest biomedical library.  The successful candidate will oversee a diverse group of some 150 scientific personnel, developing innovative new approaches to data science, biomedical informatics, and computational biology and their application to open questions in basic molecular biology, genomics, health, and healthcare.

Continue reading

Human genome annotation will be updated every 2 months

NCBI will be updating the human genome RefSeq annotation more frequently to incorporate improvements made to genes and transcripts by RefSeq curation experts. Faster updates will allow us to include the latest datasets.

In the past, we’ve produced a full re-annotation of the human genome about once a year. The last full annotation, Homo sapiens Annotation Release 109, was in March 2018. A full annotation is produced by two main processes:

Continue reading

Follow “Pangenomics in the Cloud” hackathon projects on GitHub

NCBI is on the West Coast this week (March 25 – 27) for “Pangenomics in the Cloud,” a three-day hackathon hosted by the University of California, Santa Cruz.

Graphs are the name of the game here! The teams will be building graphs, managing coordinates between samples and defining and identifying and marking haplotypes, and looking at population specific variants.

Please follow along on our GitHub, fork and make pull requests during and after the event, and stay tuned for updates on the findings.

Expanded accession formats appear in RefSeq release 93

RefSeq release 93 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 13, 2019. It contains 192,722,653 records, including 135,670,032 proteins, 25,840,272 RNAs, and sequences from 88,816 organisms.

Continue reading

New RefSeq annotations for big brown bat, peregrine falcon and more

Hibernating brown bat

In January and February, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Aphis gossypii (cotton aphid)
  • Balaenoptera acutorostrata scammoni (minke whale)
  • Bombyx mandarina (wild silkworm)
  • Chelonia mydas (green sea turtle)
  • Corapipo altera (white-ruffed manakin)
  • Empidonax traillii (willow flycatcher)
  • Eptesicus fuscus (big brown bat)
  • Eumetopias jubatus (Steller sea lion)
  • Falco cherrug (Saker falcon)
  • Falco peregrinus (peregrine falcon)
  • Marmota flaviventris (yellow-bellied marmot)
  • Monomorium pharaonis (pharaoh ant)
  • Neopelma chrysocephalum (saffron-crested tyrant-manakin)
  • Ovis aries (sheep)
  • Pipra filicauda (wire-tailed manakin)
  • Rhopalosiphum maidis (corn leaf aphid)
  • Solanum pennellii (eudicot)
  • Tupaia chinensis (Chinese tree shrew)
  • Vigna unguiculata (cowpea)
  • Vombatus ursinus (common wombat)
  • Xiphophorus couchianus (Monterrey platyfish)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

GenBank release 230 available, changes to number of files, expanded accessions

GenBank release 230.0  (2/15/2019) with 4.74 Terabases and 1.47 billion records is now available from the NCBI FTP site (flatfiles, ASN.1). There are two  notable changes with this release.  Because we have increased in the target maximum uncompressed file-size, the number of files dropped by about 1,000.   We are also now assigning expanded WGS  and protein accessions. WGS accessions now may have a six-letter Project Code prefix, a two-digit Assembly-Version number, followed by seven, eight, or nine digits, for example AAAABB010000001. Protein accessions may now have three-letter followed by seven digits, for example EAA0000001. See section 1.3.1 and 1.3.2 of the Release Notes for details.

Continue reading