Self-Supervised Learning#

Self-supervised learning (SSL) is a cutting-edge machine learning approach that enables models to learn meaningful representations from unlabeled data by solving automatically generated pretext tasks. This significantly reduces the need for large labeled datasets, making SSL particularly useful in domains where annotated data is scarce or costly to obtain.

Supported Methods#

The TAO Finetuning Microservice provides built-in support for two state-of-the-art self-supervised learning algorithms:

Nv-DINOv2 (Distillation with No Labels): A self-distillation method based on contrastive learning. Nv-DINOv2 trains vision transformers without labels by encouraging consistency across different views of the same image, resulting in high-quality, transferable visual features.
Masked Autoencoders (MAE): MAEs train by randomly masking parts of an input image and reconstructing the missing regions. This forces the model to learn the global structure and semantics of the data, producing versatile and robust feature representations.

Benefits and Use Cases#

Both DINOv2 and MAE are designed to produce general-purpose visual embeddings that can be fine-tuned or directly applied to a variety of downstream tasks, such as:

Image classification
Object detection
Semantic segmentation

The TAO implementation offers flexible APIs, modular training pipelines, and configuration options to help you quickly integrate SSL into your machine learning workflows.

Note

To get started with either method, see the following guides: