Step #1: Choosing the dataset

Accelerated Exome Analysis with Clara Parabricks (Latest Version)

In this section, we will show you how to download two different datasets. Depending on the experience you are looking for:

  • Small exome sample data: With this dataset you should be able to run the entire lab in one hour.

  • HG002 Genome-in-a-Bottle Sample: This dataset will allow you to get a Whole Genome variant calling experience.

Small exome sample data

The sample dataset has been pre-downloaded from here for you and placed in /data/parabricks_sample.

The following examples will require using the system console of the GPU host. Click on the “System Console” link in the left menu of this page to open a web-based SSH session.

parabricks-05.png

To use this dataset in each of the following examples, set the following variables:

Copy
Copied!
            

cd /data/parabricks_sample REFERENCE_FILE=Ref/Homo_sapiens_assembly38.fasta FASTQ1=Data/sample_1.fq.gz FASTQ2=Data/sample_2.fq.gz


HG002 Genome in a Bottle

This dataset is real 30X short-read human data generated from a child in a trio sequenced by the Genome In A Bottle Consortium.

The sample datasets have been downloaded from here and here and placed in /data/HG002/Data.

To use this dataset in each of the following examples, set the following variables:

Copy
Copied!
            

cd /data/HG002 REFERENCE_FILE=Ref/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna FASTQ1=Data/HG002.hiseqx.pcr-free.30x.R1.fastq.gz FASTQ2=Data/HG002.hiseqx.pcr-free.30x.R2.fastq.gz


© Copyright 2022-2023, NVIDIA. Last updated on May 22, 2023.