POPULATION STUDIES PIPELINE
Use the NVIDIA Clara Parabricks Pipelines Genomics Database tool to perform population studies. Create a genomic database for multiple samples and import data into it.
The population studies pipeline can be used as shown below. Optionally the Germline step can be removed, if you already have all the g.vcf.gz generated during the variant calls.
# Create a genomics database
pbrun creategenomicsdb –dir <genomics db address>
# Populate the database with data:
pbrun importgvcftodb –dir < genomics db address> --in-gvcf <input GVCF> --in-gvcf <input GVCF> -- in-gvcf <input GVCF>
# Select variants from the database
$ pbrun selectvariants --ref <Reference Genome> -dir < genomics db address> --out-gvcf <output GVCF>
- -dir
(required) Path to directory where the database will be stored.
- -dir
- --in-gvcf
(required) Directory of the database to which the gvcf data will be imported.
(required) It should be gvcf.gz format ( It should be either generated by Parabricks germline pipeline or bzip).
CLI
- --ref
- -dir
- --out-gvcf
(required) Reference human genome in fasta format. We assume that the indexing required to run bwa has been completed by the user.
(required) Location of the genomics database which will be used to select variants.
Path to the file where the merged GVCF result will be stored.