External Hardware Documentation and Resources¶
Information about hardware behavior comes from a mix of official and reverse-engineered sources.
Command buffers¶
NVIDIA open-gpu-doc repository is official documentation from NVIDIA that has been released to the public. The majority of this documentation comes in the form of class headers which describe the class state registers.
NVIDIA open-gpu-kernel-modules repository is the open-source kernel mode driver that NVIDIA ships on Turing+ GPUs with GSP. The code here can provide examples of how to use some hardware features. If open-gpu-doc is missing a class header, sometimes there will be one here.
Reverse-engineered command names from envytools are available in mesa under eg.
src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h. These are no longer updated. nvk instead uses the open-gpu-doc headersenvyhooks is the modern way to dump command sequences from the proprietary driver
nv_push_dumpis part of mesa and can disassemble command sequences (build with-D tools=nouveau, runsrc/nouveau/headers/nv_push_dumpfrom the build dir)
Shader ISA¶
NVIDIA PTX documentation is NVIDIA documentation for CUDA’s intermediate representation. We don’t use PTX directly, but this often has hints about how underlying hardware instructions work. For example, the PTX redux instruction is pretty much identical to the hardware instruction of the same name.
CUDA Binary Utilities is documentation for CUDA’s disassembler, nvdisasm. It includes a brief description of most hardware instructions. There’s also an older version that has older architectures (Kepler through Volta).
Kuter Dinel has reverse-engineered instruction encodings for the Hopper ISA and Ada ISA which are autogenerated from his nv_isa_solver project.
nv-shader-tools has some additional tools for disassembling and fuzzing the hardware ISA
Mel has dumped a list of avaiable instructions and their opcodes on recent architectures by scraping nvdisasm error messages.
The Volta whitepaper section “Independent Thread Scheduling” has an overview of the control flow model used on Volta+ GPUs.
Dissecting the NVidia Turing T4 GPU via Microbenchmarking has reverse-engineered info about the Turing instruction encoding. See especially section “2.1 Control information” for an overview of compiler-inserted delays and waits on Maxwell and later.
Analyzing Modern NVIDIA GPU cores has additional reverse-engineered info about the semantics of compiler-inserted delays and waits.
Control Flow Management in Modern GPUs has more detail about control flow reconvergence on Volta+
maxas has some reverse-engineered info on the Maxwell ISA
asfermi has some reverse-engineered info on the older Fermi ISA
Red Hat has some NDA’d documentation on instruction latencies from NVIDIA. Bother karolherbst or airlied on irc if you’re missing a latency class for an instruction on recent architectures.
Behavior of instructions are tested using the hardware tests in
src/nouveau/compiler/nak/hw_tests.rsand the correspondingFoldableimplementations insrc/nouveau/compiler/nak/ir.rs(build with-D build-tests=trueand runsrc/nouveau/compiler/nak hw_testsfrom the build dir)NAK’s instruction encodings are tested against nvdisasm using
src/nouveau/compiler/nak/nvdisasm_tests.rs(build with-D build-tests=trueand runsrc/nouveau/compiler/nak nvdisasm_testsfrom the build dir)The old GL driver’s compiler, under
src/gallium/drivers/nouveau/codegen, has some information. This is especially useful for graphics-only instructions, which are often not covered by other sources.Compiler explorer is a convenient tool to see what assembly NVIDIA generates for a given CUDA program.
Misc¶
envytools has reverse-engineered documentation for maxwell and earlier hardware.
The nvidia architecture whitepapers give a basic overview of what has changed between hardware revisions. See eg. the Blackwell whitepaper
The nvidia architecture tuning guides often mention how details of a hardware generation has changed, often with information about the memory subsystem or occupancy. See eg. the Blackwell tuning guide
The Nouveau wiki’s CodeNames page is useful for mapping NVIDIA marketing names to engineering names
Matching CUDA arch and CUDA gencode for various NVIDIA architectures has a useful table comparing SM versions to engineering names