MANE Select v0.5 is now available!


In October last year, we announced the launch of an exciting new collaboration between NCBI and EMBL-EBI called MANE (Matched Annotation from the NCBI and EMBL-EBI). As a first step, we began generating the MANE Select set, comprising a matched representative transcript for every human protein-coding gene. Now that our genome resources are integrated into a high-quality transcript set, you don’t need to choose between RefSeq and Ensembl/GENCODE datasets for genomic analyses.

Not only does the MANE Select set make it easier for you to exchange data or translate coordinates between RefSeq and Ensembl annotation results, but you’ll also be able to use the set with NGS-based sequencing technologies and other resources that use the latest and highest-quality reference human genome assembly available.

Continue reading

Expanded accession formats appear in RefSeq release 93

RefSeq release 93 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 13, 2019. It contains 192,722,653 records, including 135,670,032 proteins, 25,840,272 RNAs, and sequences from 88,816 organisms.

Continue reading

New RefSeq annotations for big brown bat, peregrine falcon and more

Hibernating brown bat

In January and February, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Aphis gossypii (cotton aphid)
  • Balaenoptera acutorostrata scammoni (minke whale)
  • Bombyx mandarina (wild silkworm)
  • Chelonia mydas (green sea turtle)
  • Corapipo altera (white-ruffed manakin)
  • Empidonax traillii (willow flycatcher)
  • Eptesicus fuscus (big brown bat)
  • Eumetopias jubatus (Steller sea lion)
  • Falco cherrug (Saker falcon)
  • Falco peregrinus (peregrine falcon)
  • Marmota flaviventris (yellow-bellied marmot)
  • Monomorium pharaonis (pharaoh ant)
  • Neopelma chrysocephalum (saffron-crested tyrant-manakin)
  • Ovis aries (sheep)
  • Pipra filicauda (wire-tailed manakin)
  • Rhopalosiphum maidis (corn leaf aphid)
  • Solanum pennellii (eudicot)
  • Tupaia chinensis (Chinese tree shrew)
  • Vigna unguiculata (cowpea)
  • Vombatus ursinus (common wombat)
  • Xiphophorus couchianus (Monterrey platyfish)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

GenBank release 230 available, changes to number of files, expanded accessions

GenBank release 230.0  (2/15/2019) with 4.74 Terabases and 1.47 billion records is now available from the NCBI FTP site (flatfiles, ASN.1). There are two  notable changes with this release.  Because we have increased in the target maximum uncompressed file-size, the number of files dropped by about 1,000.   We are also now assigning expanded WGS  and protein accessions. WGS accessions now may have a six-letter Project Code prefix, a two-digit Assembly-Version number, followed by seven, eight, or nine digits, for example AAAABB010000001. Protein accessions may now have three-letter followed by seven digits, for example EAA0000001. See section 1.3.1 and 1.3.2 of the Release Notes for details.

The release has 212,260,377 traditional records containing 303,709,510,632 base pairs of sequence data. There are also 945,019,312 WGS records containing 4,164,513,961,679 base pairs of sequence data, 294,772,430 bulk-oriented TSA records containing 263,936,885,705 base pairs of sequence data, and 23,259,929 bulk-oriented TLS records containing 9,146,836,085 base pairs of sequence data.

During the 64 days between the close dates for GenBank Releases 229.0
and 230.0, the traditional portion of GenBank grew by 18,020,968,446
basepairs and 978,962 sequence records. During that same period,
25,301 records were updated. An average of 15,691 ‘traditional’ records
were added and/or updated per day.

Between releases 229.0 and 230.0, the WGS component of GenBank grew by
507,794,538,583 basepairs and by 171,246,122 sequence records, the TSA component of grew by 15,343,993,517 basepairs and by 19,926,957 sequence records, and  the TLS component grew by 635,006,804 basepairs and by 2,335,341 sequence records.

For downloading purposes, please keep in mind that the uncompressed GenBank release 230.0 flatfiles require roughly 964 GB (sequence files only). The ASN.1 data require approximately 773 GB.

For additional release information, see the README files in either of
the directories linked above, and the Release Notes.

Run the Prokaryotic Genome Annotation Pipeline (PGAP) on your own machine

You can now download PGAP from GitHub and run it on your machine, compute farm or the cloud, on any public or privately-owned genome. PGAP predicts genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. This is a great opportunity for you to try it now and send us comments (please use GitHub issues).

Continue reading

IgBLAST 1.13.0 now available

IgBLAST is a popular NCBI package for classifying and analyzing  immunoglobulin and T cell receptor variable domain sequences. We’ve released a new version of IgBLAST with three new improvements:

  1. The new release determines the V gene reading frame from the end of FWR3 region instead of end of V gene.  This helps identify the correct reading frames for rearrangements that have insertions or deletions near the V gene end.
  2. The allowed distance between V gene end and J gene start has been increased to 225 bp to allow detection of ultra long D/N regions.
  3. The standalone program and files has been repackaged to make it easier  to install.

The new release is available from the BLAST FTP area, along with a new manual on GitHub.

Improved search makes it easier to find antimicrobial resistance protein information

It’s now easier to find known antimicrobial resistance (AMR) protein information at NCBI. You can search by gene symbol, protein name, or accession across NCBI databases and retrieve the best representative DNA sequence that is a reference for antimicrobial resistance genes from the National Database of Antibiotic Resistant Organisms (NDARO).


Continue reading

Track pathogenic organisms promptly with the National Database of Antibiotic Resistant Organisms

In response to the rising threat of antimicrobial resistance (AMR), NCBI built the National Database of Antibiotic Resistant Organisms (NDARO). With NDARO, you can:


Figure 1. Filter your Isolates Browser results based on date, location, and for antibiotic resistance (whether the isolate has any AMR genes, or any Antimicrobial Susceptibility Testing (AST) phenotype submitted).

Continue reading

Change the way your graphs look with Genome Data Viewer’s enhanced settings

If you need to change your graph type – say, from histogram to line graph or a heat map – in Genome Data Viewer (GDV), you can now do so with a few clicks.

Click on the track name of any graph track to change the display (see Figure 1A, B and C).


Figure 1. Click on a track name to expose the graph settings menu (A). Set graph display style (B) to histogram, heat map or line graph (C).

Continue reading