Vector Addition#
This example shows basic creation of ShardTensors. You can also see how distributed computation can acclerate basic operations.
Run the baseline example, which is not sharded:
python vector_add_baseline.py
Then, run the sharded example (using torchrun as a launcher, for example)
torchrun --nproc-per-node 8 vector_add_sharded.py
You should see an improved performance measurement for the sharded runs.