Logo
  • 1. NVIDIA Ampere GPU Architecture Tuning Guide
  • 2. Revision History
  • 3. Notices
ampere-tuning-guide
  • »
  • Contents
  • v12.0 | PDF | Archive  

Contents

  • 1. NVIDIA Ampere GPU Architecture Tuning Guide
    • 1.1. NVIDIA Ampere GPU Architecture
    • 1.2. CUDA Best Practices
    • 1.3. Application Compatibility
    • 1.4. NVIDIA Ampere GPU Architecture Tuning
      • 1.4.1. Streaming Multiprocessor
        • 1.4.1.1. Occupancy
        • 1.4.1.2. Asynchronous Data Copy from Global Memory to Shared Memory
        • 1.4.1.3. Hardware Acceleration for Split Arrive/Wait Barrier
        • 1.4.1.4. Warp level support for Reduction Operations
        • 1.4.1.5. Improved Tensor Core Operations
        • 1.4.1.6. Improved FP32 throughput
      • 1.4.2. Memory System
        • 1.4.2.1. Increased Memory Capacity and High Bandwidth Memory
        • 1.4.2.2. Increased L2 capacity and L2 Residency Controls
        • 1.4.2.3. Unified Shared Memory/L1/Texture Cache
      • 1.4.3. Third Generation NVLink
  • 2. Revision History
  • 3. Notices
    • 3.1. Notice
    • 3.2. OpenCL
    • 3.3. Trademarks

© Copyright 2020-2022, NVIDIA Corporation & Affiliates. All rights reserved. Last updated on Dec 08, 2022.