Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Datasets
ImageNet Data Preparation
Note
It is the responsibility of each user to check the content of the dataset, review the applicable licenses, and determine if it is suitable for their intended use. Users should review any applicable links associated with the dataset before placing the data on their machine.
Please note that according to the ImageNet terms and conditions, automated scripts for downloading the dataset are not provided. Instead, one can follow the steps outlined below to download and extract the data.
ImageNet 1k
Create an account on ImageNet and navigate to ILSVRC 2012. Download “Training images (Task 1 & 2)” and “Validation images (all tasks)” to
data/imagenet_1k
.Extract the training data:
mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..
Extract the validation data and move the images to subfolders:
mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
ImageNet 21k
Create an account on ImageNet and download “ImageNet21k” to
data/imagenet_21k
.Extract the data:
tar -xvf winter21_whole.tar.gz && rm -f winter21_whole.tar.gz
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done