The New York Genome Center is hosting an NCBI Single Cell in the cloud codeathon from January 15-17, 2020. Submissions for project proposals are due December 2nd.
Please submit your proposal and apply here.
What topics are in scope?
This codeathon will focus on single cell data, including RNA, DNA, and chromatin accessibility. We are particularly interested in proposals for pipelines and analysis of SRA data, data interoperability, and using machine learning techniques in clustering. We also welcome proposals for tutorial pipelines and educational tools. You will have access to computational resources in the Cloud to turn your idea into a working prototype. Visit our website for examples of previous codeathon projects.
RefSeq release 97 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available, as of November 4, 2019 and contains 219,407,891 records, including 157,639,958 proteins, 28,730,283 RNAs, and sequences from 97,407 organisms.
The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Recently, the NCBI Eukaryotic Genome Annotation Pipeline has released new annotations in RefSeq for the following organisms:
- Aedes albopictus (Asian tiger mosquito)
- Aquila chrysaetos chrysaetos (golden eagle)
- Archocentrus centrarchus (flier cichlid)
- Calypte anna (Anna’s hummingbird)
- Camarhynchus parvulus (bird)
- Camelus dromedarius (Arabian camel)
- Cannabis sativa (hemp)
- Chanos chanos (milkfish)
Are you interested in high quality genomic annotations for human and mouse? Check out the Consensus Coding Sequence (CCDS) project! Release 23 of the CCDS project is now available in Entrez Gene. This release compares NCBI’s Mus musculus annotation release 108 to Ensembl’s annotation release 98. This update adds 1,570 new CCDS records and 175 genes to the mouse CCDS dataset. In total, release 23 includes 27,219 CCDS records that correspond to 20,486 genes.
On Wednesday, November 13, 2019 at 12 PM, NCBI staff will present a webinar on NCBI resources for next-gen sequence analysis. You will learn about key resources that support multiple aspects of next-gen sequence analyses, including quality control, alignment, data visualization and interpreting results. You will also see how to access and apply these resources for both SRA and your own RNASeq/DNASeq datasets. Whether you’re embarking on your first analysis or already have a background in bioinformatics, you’ll find tools that meet your needs!
- Date and time: Wed, Nov 13, 2019 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
NCBI is pleased to announce a Biomedical Data Science Codeathon in collaboration with Carnegie Mellon in Pittsburgh, PA on January 8-10, 2020.
We’re specifically seeking people with experience working with complex diseases, precision medicine, and genomic analyses. If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments. The event is open to anyone selected for the codeathon and willing to travel to Pittsburgh.
Potential topics include:
- Virus Genome Graph tools
- Image analysis pipelines
- RNAseq pipelines
- Cancer graph genomes
- Complex Disease Analysis
GenBank release 234.0 (10/14/2019) is now available on the NCBI FTP site. This release has 6.69 trillion bases and 1.68 billion records.
The release has 216,763,706 traditional records containing 386,197,018,538 base pairs of sequence data. There are also 1,097,629,174 WGS records containing 5,985,250,251,028 base pairs of sequence data, 342,811,151 bulk-oriented TSA records containing 305,371,891,408 base pairs of sequence data, and 27,460,978 bulk-oriented TLS records containing 10,848,455,369 base pairs of sequence data.
You now have access to bulk settings options for track hubs in the Genome Data Viewer (GDV) and Sequence Viewer. These settings allow you to pick the default tracks that load into the viewer from your chosen track hub. You can access the bulk options menu for by clicking on the collapsed menu or “hamburger” icon (stack of horizontal bars) at the right end of the track grouping in the Configure Track Hubs dialog (Figure 1).Figure 1. The Configure Track Hubs dialog in GDV. You can activate the bulk settings menu for a connected track hub by clicking on hamburger icon at the right of the track grouping. Clicking Select Default tracks checks on all of the tracks in that grouping, Smoothed PhyloCSF in this case. Continue reading
As you may have heard, we are working on a new version of PubMed, and we’ve recently released some new features that you can check out.
A new user guide answers many common questions about how best to use the new site. We’ve also added links on the new PubMed homepage to many popular sites including the E-utilities, Advanced Search, and the MeSH database.
The action menu (Figure 1) now contains Collections and My Bibliography, allowing you to manage and share groups of citations. After running a search, you will also find a “Create alert” link under the search box that lets you set up automatic My NCBI email updates for your search.
Figure 1. New PubMed search result page showing the new “Create alert” link and updated action menu.
Going forward, we will continue to develop new features leading up to the time when this new version of PubMed will replace the legacy PubMed. As this progresses, we would love to hear what you think about these new additions! Please use the “Feedback” button (available on every page of the new PubMed) to submit your comments, questions, or concerns.
The latest dbVar data release includes the Genome in a Bottle benchmark structural variant (SV) callset (pre-print Zook et al. 2019) – a highly scrutinized, carefully curated set of 12,745 sequence-resolved deletions, insertions, and delins variants from Personal Genome Project Ashkenazi trio son HG002. The data serve as a robust benchmark standard with which to measure the performance of sequencing and variant-calling pipelines. It “reliably identifies both false negatives and false positives in high-quality SV callsets” (pre-print Zook et al. 2019) that are based on short-, linked-, and long-read sequencing as well as optical mapping.