Step #1: Choosing the dataset
In this section, we will show you how to download two different datasets. Depending on the experience you are looking for:
Small exome sample data: With this dataset you should be able to run the entire lab in one hour.
HG002 Genome-in-a-Bottle Sample: This dataset will allow you to get a Whole Genome variant calling experience.
Small exome sample data
The sample dataset has been pre-downloaded from here for you and placed in /data/parabricks_sample
.
The following examples will require using the system console of the GPU host. Click on the “System Console” link in the left menu of this page to open a web-based SSH session.
To use this dataset in each of the following examples, set the following variables:
cd /data/parabricks_sample
REFERENCE_FILE=Ref/Homo_sapiens_assembly38.fasta
FASTQ1=Data/sample_1.fq.gz
FASTQ2=Data/sample_2.fq.gz
HG002 Genome in a Bottle
This dataset is real 30X short-read human data generated from a child in a trio sequenced by the Genome In A Bottle Consortium.
The sample datasets have been downloaded from here and here and placed in /data/HG002/Data
.
To use this dataset in each of the following examples, set the following variables:
cd /data/HG002
REFERENCE_FILE=Ref/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
FASTQ1=Data/HG002.hiseqx.pcr-free.30x.R1.fastq.gz
FASTQ2=Data/HG002.hiseqx.pcr-free.30x.R2.fastq.gz