KNN#

This examples shows how to parallelize non-trivial operations, such as a kNN.

The first example, the brute-foce kNN, will use a single GPU:

python knn_brute_force_baseline.py

The second example uses sharded tensors, leveraging DTensor automatic fallback paths in a sub-optimal way:

torchrun --nproc-per-node 8 knn_brute_force_sharded.py

However, the automatic path for parallelization is not the most efficient. See how to implement a better parallel operation manually with the last script.

torchrun --nproc-per-node 8 knn_brute_force_ring_sharded.py