Step #1: Choosing the dataset
In this section, we will show you how to download two different datasets. Depending on the experience you are looking for:
Small exome sample data: With this dataset you should be able to run the entire lab in one hour.
HG002 Genome-in-a-Bottle Sample: This dataset will allow you to get a Whole Genome variant calling experience.
Small exome sample data
The sample dataset has been pre-downloaded from here for you and placed in /data/parabricks_sample
.
The following examples will require using the system console of the GPU host. Click on the “System Console” link in the left menu of this page to open a web-based SSH session.
![parabricks-05.png](https://docscontent.nvidia.com/dims4/default/a94c173/2147483647/strip/true/crop/156x67+0+0/resize/156x67!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F00000188-44c1-ddbb-a1af-cfdf80490000%2Flaunchpad%2Fai%2Fclara-parabricks%2Flatest%2F_images%2Fparabricks-05.png)
To use this dataset in each of the following examples, set the following variables:
cd /data/parabricks_sample
REFERENCE_FILE=Ref/Homo_sapiens_assembly38.fasta
FASTQ1=Data/sample_1.fq.gz
FASTQ2=Data/sample_2.fq.gz
HG002 Genome in a Bottle
This dataset is real 30X short-read human data generated from a child in a trio sequenced by the Genome In A Bottle Consortium.
The sample datasets have been downloaded from here and here and placed in /data/HG002/Data
.
To use this dataset in each of the following examples, set the following variables:
cd /data/HG002
REFERENCE_FILE=Ref/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
FASTQ1=Data/HG002.hiseqx.pcr-free.30x.R1.fastq.gz
FASTQ2=Data/HG002.hiseqx.pcr-free.30x.R2.fastq.gz