Quickstart#

Here’s the absolute minimal example to get started with Nsight Python. Just add a decorator to your function and wrap the kernel you want to profile with nsight.annotate():

import torch
import nsight

@nsight.analyze.kernel
def benchmark_matmul(n):
    """
    The simplest possible benchmark.
    We create two matrices and multiply them.
    """
    # Create two NxN matrices on GPU
    a = torch.randn(n, n, device="cuda")
    b = torch.randn(n, n, device="cuda")

    # Mark the kernel we want to profile
    with nsight.annotate("matmul"):
        c = a @ b

    return c

if __name__ == "__main__":
    # Run the benchmark
    result = benchmark_matmul(1024)

That’s it! Nsight Python will automatically profile your kernel, collect metrics, and display the results.

For more advanced examples including parameter sweeps, custom metrics, and visualization, check out the examples directory.