We have just launched GaPTools, a stand-alone data validation tool for NCBI’s database of Genotype and Phenotype (dbGaP) submissions. You can use GaPTools to validate your dbGaP submissions or submissions to other genomic data repositories. GaPTools checks for common data inconsistency and integrity issues and validates subject-sample ID mapping, subject consents, data dictionaries, and phenotype and genotype data. GaPTools is available as a docker image on Docker Hub.
Why Use GaPTools?
GaPTools will validate files before you submit (see Figure 1). This means that by the time you formally submit, some of the pre-validation steps are already addressed. This tool allows you to prepare your data quickly and ensures a faster processing cycle and a faster release of your individual-level research data.Figure 1: Flow chart depicting data submission and GaPTools validation
What does GaPTools do?
GaPTools performs a series of checks that are typical for any genomic data validation and can be used by any repository or organization that is involved in this data preparation. The validation checks included in GaPTools are listed in Github. Setup and execution steps are also provided in the ReadMe instructions.
The validation that you run independently is the same as the one the dbGaP Submission Portal uses. Individuals who submit Subject Sample Mapping (SSM), Study Consent (SC), and Genotype data to dbGaP can easily set up and access this tool through the Docker image on Docker Hub. This tool will validate sample ID, gender, relationships, and other data inconsistencies across metadata, phenotype, and genotype files.
This video on our YouTube channel provides more information about using GaPTools:
This is just the beginning! We have plans to add more Phenotype and other checks.
Learn more here.