If you are a consumer or producer of AGP (A Golden Path) files for genome assemblies, please read on. We’d like your feedback on the proposed changes described here.
As you know, AGP files are used to describe the structure of certain genome assemblies. The AGP file format has not kept up with changes in sequencing technology or International Sequence Database Collaboration (INSDC) feature usage. NCBI is therefore proposing to extend the current AGP v2.0 specification to add new linkage evidence types and a gap type of “contamination” as detailed below and described in the AGP v2.1 proposed specification.
We’ve been making improvements to the NCBI genome Assembly resource. Highlights include:
- Links added between members of a pair of genome assemblies derived from the same diploid individual
- Additional filters now shown on the left-hand side bar
- Annotation status
- Assembly type, including the new types “Unresolved diploid” and “Alternate pseudohaplotype”
- vhost filters on the Advanced page Search Builder that allow selection of virus assemblies with a particular host (e.g. “vhost human”)
- Searching by assembly names with the version unspecified
- Total ungapped length reported in the “Global statistics” table, replacing the less useful total gap length
- Improved N50 & L50 statistics presentation for complex genome assemblies
NCBI announces Annotation Release 100 of the Pacific white shrimp (Penaeus vannamei) genome in RefSeq, based on the assembly (GCF_003789085.1) submitted by the Institute of Oceanology, Chinese Academy of Sciences. The Pacific white shrimp is one of the most important shrimp species in fisheries and aquaculture and represents the first decapod to have its genome annotated by NCBI. We predicted 24,987 protein coding genes with evidence from alignment of six billion RNA-Seq reads and homology with invertebrate proteins. This annotation will enable genomic research in this commercially important species.
You can download the annotated assembly or browse and search it in the Genome Data Viewer.
Please visit our Eukaryotic RefSeq Genome Annotation Status page to see more annotations in progress.
If you’ve been searching in Gene, Nucleotide, Protein, Genome or Assembly databases, you’ve probably noticed the new search experience we introduced in September to interpret several common language searches and offer improved results. We’re excited to announce we’ve added as-you-type suggestions to the search bar in these databases.
Here’s a peek at the new menu in the NCBI Gene database.
Figure 1. Typing into the search box brings up automatic suggestions of the most popular queries.
We’ve been making improvements to the NCBI Assembly resource. Highlights from the past two releases, 1.25 and 1.26, include:
Earlier this year, we announced the release of a new and improved search feature that interprets plain language to give better results for common searches. This feature, originally developed in NCBI Labs and later released on the NCBI All Databases search, is now available across several NCBI resources: Nucleotide, Protein, Gene, Genome, and Assembly. Whether you are searching for a specific gene or for a whole genome, you will now retrieve NCBI’s best results regardless of the database you search.
The image below shows the results for a search for human INS in the Nucleotide database. Even though this is a Nucleotide search, the results include relevant information from Gene, Protein, Taxonomy, plus links to the NCBI reference sequences (RefSeq) as well as access to BLAST and the insulin gene region in NCBI’s genome browser, the Genome Data Viewer.Figure 1. The new natural language search result in the Nucleotide database from a search for human INS.
Try out this new search capability and let us know what you think. And keep visiting the NCBI Labs search page to try our latest experiments, which we’ll also announce here on NCBI Insights.
The Genome Data Viewer (GDV) is now the main genome browser at NCBI replacing the Map Viewer, our original genome browser. GDV is a modern genome browser with essential improvements over Map Viewer. These include sequence-level details and an automated update process that keeps up with the rapid pace of genome sequencing, assembly and annotation.
The Genome Data Viewer homepage (top panel) and browser view (bottom panel)
This blog post is directed toward Assembly users.
A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.
For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.
The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection, a transmissible cancer known as Transmissible Devil Facial Tumor Disease (TDFTD). In a previous NCBI Insights post, we discussed gene expression data from the tumors that established their neural origin and showed the tumors were likely derived from Schwann cells. In this post, we’ll consider some of the genome sequencing projects in the NCBI databases and explore evidence that the tumor originated in a different individual than the affected animal supporting the idea that the tumor cells themselves are infectious agents. Continue reading