Step #1: Choosing the dataset

In this section, we will show you how to download two different datasets. Depending on the experience you are looking for:

  • Small exome sample data: With this dataset you should be able to run the entire lab in one hour.

  • HG002 Genome-in-a-Bottle Sample: This dataset will allow you to get a Whole Genome variant calling experience.

Small exome sample data

The sample dataset has been pre-downloaded from here for you and placed in /data/parabricks_sample.

The following examples will require using the system console of the GPU host. Click on the “System Console” link in the left menu of this page to open a web-based SSH session.

parabricks-05.png

To use this dataset in each of the following examples, set the following variables:

Copy
Copied!
            

cd /data/parabricks_sample REFERENCE_FILE=Ref/Homo_sapiens_assembly38.fasta FASTQ1=Data/sample_1.fq.gz FASTQ2=Data/sample_2.fq.gz


HG002 Genome in a Bottle

This dataset is real 30X short-read human data generated from a child in a trio sequenced by the Genome In A Bottle Consortium.

The sample datasets have been downloaded from here and here and placed in /data/HG002/Data.

To use this dataset in each of the following examples, set the following variables:

Copy
Copied!
            

cd /data/HG002 REFERENCE_FILE=Ref/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna FASTQ1=Data/HG002.hiseqx.pcr-free.30x.R1.fastq.gz FASTQ2=Data/HG002.hiseqx.pcr-free.30x.R2.fastq.gz


© Copyright 2022-2023, NVIDIA. Last updated on May 22, 2023.