External Hardware Documentation and Resources¶

Information about hardware behavior comes from a mix of official and reverse-engineered sources.

Command buffers¶

NVIDIA open-gpu-doc repository is official documentation from NVIDIA that has been released to the public. The majority of this documentation comes in the form of class headers which describe the class state registers.

NVIDIA open-gpu-kernel-modules repository is the open-source kernel mode driver that NVIDIA ships on Turing+ GPUs with GSP. The code here can provide examples of how to use some hardware features. If open-gpu-doc is missing a class header, sometimes there will be one here.

Reverse-engineered command names from envytools are available in mesa under eg. src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h. These are no longer updated. nvk instead uses the open-gpu-doc headers

envyhooks is the modern way to dump command sequences from the proprietary driver

nv_push_dump is part of mesa and can disassemble command sequences (build with -D tools=nouveau, run src/nouveau/headers/nv_push_dump from the build dir)

Shader ISA¶

NVIDIA PTX documentation is NVIDIA documentation for CUDA’s intermediate representation. We don’t use PTX directly, but this often has hints about how underlying hardware instructions work. For example, the PTX redux instruction is pretty much identical to the hardware instruction of the same name.

CUDA Binary Utilities is documentation for CUDA’s disassembler, nvdisasm. It includes a brief description of most hardware instructions. There’s also an older version that has older architectures (Kepler through Volta).

Kuter Dinel has reverse-engineered instruction encodings for the Hopper ISA and Ada ISA which are autogenerated from his nv_isa_solver project.

nv-shader-tools has some additional tools for disassembling and fuzzing the hardware ISA

Mel has dumped a list of avaiable instructions and their opcodes on recent architectures by scraping nvdisasm error messages.

The Volta whitepaper section “Independent Thread Scheduling” has an overview of the control flow model used on Volta+ GPUs.

Dissecting the NVidia Turing T4 GPU via Microbenchmarking has reverse-engineered info about the Turing instruction encoding. See especially section “2.1 Control information” for an overview of compiler-inserted delays and waits on Maxwell and later.

Analyzing Modern NVIDIA GPU cores has additional reverse-engineered info about the semantics of compiler-inserted delays and waits.

Control Flow Management in Modern GPUs has more detail about control flow reconvergence on Volta+

maxas has some reverse-engineered info on the Maxwell ISA

asfermi has some reverse-engineered info on the older Fermi ISA

Red Hat has some NDA’d documentation on instruction latencies from NVIDIA. Bother karolherbst or airlied on irc if you’re missing a latency class for an instruction on recent architectures.

Behavior of instructions are tested using the hardware tests in src/nouveau/compiler/nak/hw_tests.rs and the corresponding Foldable implementations in src/nouveau/compiler/nak/ir.rs (build with -D build-tests=true and run src/nouveau/compiler/nak hw_tests from the build dir)

NAK’s instruction encodings are tested against nvdisasm using src/nouveau/compiler/nak/nvdisasm_tests.rs (build with -D build-tests=true and run src/nouveau/compiler/nak nvdisasm_tests from the build dir)

The old GL driver’s compiler, under src/gallium/drivers/nouveau/codegen, has some information. This is especially useful for graphics-only instructions, which are often not covered by other sources.

Compiler explorer is a convenient tool to see what assembly NVIDIA generates for a given CUDA program.

Misc¶

envytools has reverse-engineered documentation for maxwell and earlier hardware.

The nvidia architecture whitepapers give a basic overview of what has changed between hardware revisions. See eg. the Blackwell whitepaper

The nvidia architecture tuning guides often mention how details of a hardware generation has changed, often with information about the memory subsystem or occupancy. See eg. the Blackwell tuning guide

The Nouveau wiki’s CodeNames page is useful for mapping NVIDIA marketing names to engineering names

Matching CUDA arch and CUDA gencode for various NVIDIA architectures has a useful table comparing SM versions to engineering names