RADV

RADV is a Vulkan driver for AMD GCN/RDNA GPUs.

Introduction

RADV is a userspace driver that implements the Vulkan API on most modern AMD GPUs.

Many Linux distributions include RADV in their default installation as part of their Mesa packages. It is also the Vulkan driver on the Steam Deck, the handheld console developed by Valve.

Features

The easiest way to track the feature set of RADV (and other Vulkan drivers in Mesa) is to take a look at the Mesa matrix.

Contributing to RADV

RADV is part of Mesa, so we recommend reading Mesa’s guidelines for submitting patches.

Additionally, the RADV team agreed on the following workflow for contributions:

  • Do NOT merge any MRs without at least one approval from someone familiar with the code.

  • There are different types of approvals:

    • Reviewed-by tag

    • MR approval button

    • Acked-by tag

  • After receiving the first approval, please wait at least 24 hours (excluding weekends) before merging the MR. This is to make sure that anyone who may be interested had opportunity and time to look at the MR (eg. timezone differences). The wait is not required in the following cases:

    • The MR is trivial (eg. very simple cleanups, typos, CI flakes updates)

    • The MR has been approved by two or more developers

  • Even if your MR has been approved and is ready to be merged, please make sure to resolve any open conversations about potentially controversial commits.

In case your MR needs to be merged immediately (should be rarely needed), feel free to ping the developers to get more approvals.

Supported hardware

All GCN and RDNA GPUs that are supported by the Linux kernel (and capable of graphics) are also supported by RADV, starting from GCN 1. We are always working on supporting the very latest GPUs too.

Vulkan API support:

  • GFX6-7 (GCN 1-2): Vulkan 1.3

  • GFX8 and newer (GCN 3-5 and RDNA): Vulkan 1.4

The exact list of Vulkan conformant products can be seen here.

Each GPU chip can contain various hardware blocks (also known as IP blocks), and each of those are separately versioned. We usually refer to hardware generations by the main version number of the GFX (graphics) hardware block.

Each hardware generation has different chips which usually only have minor differences between each other, such as number of compute units and different minor versions of some IP blocks. We refer to each chip by the code name of its first release, and we don’t differentiate between refreshes of the same chip, because they are functionally exactly the same.

For more information about which GPU chip name corresponds to which GPU product, see the src/amd/common/amd_family.h file.

Note that for GFX6-7 (GCN 1-2) GPUs, the amdgpu kernel driver is currently not the default in Linux (by default the old radeon KMD is used for these old GPUs, which is not supported by RADV), so users need to manually enable amdgpu by adding the following to the kernel command line: radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1

Basics

The RADV source code is located in src/amd/vulkan.

RADV is a userspace driver, compiled to a shared library file. On Linux, typically libvulkan_radeon.so (equivalent to a .dll on Windows).

When you start a Vulkan application, the Vulkan loader (in userspace) will find a set of Vulkan drivers (also known as Vulkan implementations), all of which are technically shared libraries. If you are running on a system with a supported AMD GPU and have RADV installed, the loader will find libvulkan_radeon.so and load that. The Vulkan application can then choose which available Vulkan implementation to use.

With RADV, when the application makes Vulkan API calls (aka. entry points), the vk* functions will end up calling radv_* functions. For example, vkCmdDrawMeshTasksEXT will actually call radv_CmdDrawMeshTasksEXT.

Responsibilities of RADV vs. the kernel driver

Due to the complexity of how modern GPUs work, the graphics stack is split between kernel-mode drivers (KMD) and user-mode drivers (UMD). All Graphics APIs such as Vulkan, OpenGL, etc. are implemented in userspace.

RADV is a UMD that currently works with the amdgpu KMD in the Linux kernel. Interacting with the KMD is done by RADV’s winsys code.

The KMD is responsible for:

  • Talking to the GPU through a PCIe port and power management (such as choosing voltage, frequency, sleep modes etc.)

  • Display functionality

  • Video memory management (and GTT)

  • Writing submitted commands to the GPU’s ring buffers

The UMD (in our case, RADV) is responsible for:

  • Recording commands in a binary format that the GPU can understand

  • Programming GPU registers for correct functionality

  • Compiling shaders to the GPU’s native ISA (instruction set architecture) and uploading them to memory accessible by the GPU

  • Submitting recorded commands to the kernel through a system call

To communicate with amdgpu, RADV relies on the DRM userspace API (uAPI) of the Linux kernel, which is a set of system calls. RADV depends on libdrm for some functionality and for others it uses the system calls directly.

Command submission and PM4 packets

Vulkan applications record a series of commands in command buffers and later submit these command buffers to one of the queues in the GPU.

Command buffer recording in RADV is implemented by emitting packets in a buffer, which is called a command stream (CS).

Command packets are more or less analogous to Vulkan API calls, which means that each Vulkan command (such as draw or dispatch) corresponds to one or more packets. However, depending on how closely the hardware follows Vulkan spec some commands will require a more complex implementation. For example, a simple vkCmdDraw call will result in just one packet, however emitting draw state (before the draw) may take dozens of packets.

For the graphics queue (GFX) and async compute queue (ACE), the PM4 packet format is used; other queues such as SDMA and the various video queues have their own format. Commands typically vary between different GPU generations.

When submitting to a queue, several CSs are submitted to the kernel at once. The kernel terminology calls these indirect buffers (IB) because the kernel typically uses the INDIRECT_BUFFER PM4 command to execute them. Modern AMD GPUs have several queues, which more or less map to the Vulkan queue types, though sometimes we need to submit to more than one HW queue at the same time.

After command submission, the packets in the IBs will be executed by the command processor (CP) of the queue that the buffer was submitted to. The exact capabilities and supported commands of the CP depend on HW generation and queue type.

Each CP has a collection of registers that control how the CP behaves. Registers in the CP are not to be confused with registers in shaders. For example, the addresses of shader programs, some details of shader execution, draw call information etc. have a corresponding register. These registers can be typically set by PM4 packets.

Shader compilation

One of the main responsibilities of RADV is compiling shaders for Vulkan applications.

RADV relies on the Mesa shader compiler stack. Here is a rough overview of how it works:

  • Vulkan applications pass shaders to RADV in the SPIR-V format

  • RADV calls spirv_to_nir which translates the shader into the NIR intermediate representation

  • We perform various optimizations and lowerings on the NIR shader

  • The shader is linked with the other shaders it is compiled with (if any)

  • Still using NIR, the shader is further lowered to be closer to the hardware

  • Finally, we pass the lowered NIR shader to ACO, our compiler backend, which compiles it into GPU specific ISA

ACO is the default shader-compiler used in RADV. Read its documentation here.

We still maintain an LLVM based compiler backend too, which is these days solely used for testing and hardware bringup. Users are recommended NOT to use the LLVM backend.

Shader execution

Some commands (such as draws and dispatches) ask the CP to launch shaders. Shader launch is handled by the firmware, based on registers that control shader programs. Additionally, draw commands will also automatically use the appropriate fixed function units in the hardware.

Shaders are executed by a so-called compute unit on the GPU, which is a SIMD machine. A shader invocation is a single SIMD lane (AMD also calls it thread, not to be confused by a CPU HW thread), and a subgroup is all 64 or 32 SIMD lanes together (also known as a wave). Each wave is a separate running instance of a shader program, but multiple waves can be grouped together into workgroups.

Registers in shaders (not to be confused with registers in the CP):

  • VGPR - vector general purpose register: each SIMD lane has a different value for this register

  • SGPR - scalar general purpose register: same value within a wave

For further reading, AMD has publised whitepapers and documentation for the GCN and RDNA GPU architectures. These can be found on their GPUOpen site.

Debugging

For a list of environment variables to debug RADV, please see RADV driver environment variables for a list.

Instructions for debugging GPU hangs can be found here.

DRI Configuration Options

RADV supports per-application option overrides via ~/.drirc, /etc/drirc, or a file in /etc/drirc.d/. Options can also be set like environment variables.

See the driconf documentation for the file format.

Debugging

Option

Type

Default

Range / Values

Description

radv_disable_aniso_single_level

bool

false

Disable anisotropic filtering for single level images

radv_disable_dcc

bool

false

Disable DCC for color images on GFX8-GFX11.5

radv_disable_dcc_mips

bool

false

Disable DCC for color images with mips on GFX8-GFX11.5

radv_disable_dcc_stores

bool

false

Disable DCC for color storage images on GFX10-GFX11.5

radv_disable_shrink_image_store

bool

false

Disabling shrinking of image stores based on the format

radv_disable_sinking_load_input_fs

bool

false

Disable sinking load inputs for fragment shaders

radv_disable_tc_compat_htile_general

bool

false

Disable TC-compat HTILE in GENERAL layout

radv_disable_trunc_coord

bool

false

Disable TRUNC_COORD to use D3D10/11/12 point sampling behaviour. This has special behaviour for DXVK.

radv_enable_mrt_output_nan_fixup

bool

false

Replace NaN outputs from fragment shaders with zeroes for floating point render target

radv_flush_before_query_copy

bool

false

Wait for timestamps to be written before a query copy command

radv_flush_before_timestamp_write

bool

false

Wait for previous commands to finish before writing timestamps

radv_invariant_geom

bool

false

Mark geometry-affecting outputs as invariant

radv_no_dynamic_bounds

bool

false

Disabling bounds checking for dynamic buffer descriptors

radv_split_fma

bool

false

Split application-provided fused multiply-add in geometry stages

radv_ssbo_non_uniform

bool

false

Always mark SSBO operations as non-uniform.

radv_tex_non_uniform

bool

false

Always mark texture sample operations as non-uniform.

radv_wait_for_vm_map_updates

bool

false

Wait for VM MAP updates at allocation time to mitigate use-before-alloc

radv_no_implicit_varying_subgroup_size

bool

false

Do not assume VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE for SPIR-V 1.6.

radv_rt_wave64

bool

false

Force wave64 in RT shaders

radv_hide_rebar_on_dgpu

bool

false

Hide resizable bar on dGPUs by exposing a fake carveout of 256MiB.

radv_app_layer

string

(none)

Select an application layer.

radv_override_uniform_offset_alignment

int

0

0128

Override the minUniformBufferOffsetAlignment exposed to the application. (0 = default)

radv_force_64_byte_sampled_image

bool

false

Force sampled images size to 64 bytes.

vk_lower_terminate_to_discard

bool

false

Lower terminate to discard (which is implicitly demote)

vk_zero_vram

bool

false

Initialize to zero all VRAM allocations

vk_wsi_force_bgra8_unorm_first

bool

false

Force vkGetPhysicalDeviceSurfaceFormatsKHR to return VK_FORMAT_B8G8R8A8_UNORM as the first format

vk_wsi_force_swapchain_to_current_extent

bool

false

Force VkSwapchainCreateInfoKHR::imageExtent to be VkSurfaceCapabilities2KHR::currentExtent

vk_wsi_disable_unordered_submits

bool

false

Disable unordered WSI submits to workaround application synchronization bugs

vk_x11_ignore_suboptimal

bool

false

Force the X11 WSI to never report VK_SUBOPTIMAL_KHR

Performance

Option

Type

Default

Range / Values

Description

radv_disable_ngg_gs

bool

false

Disable NGG GS on GFX10/GFX10.3.

radv_enable_unified_heap_on_apu

bool

false

Enable an unified heap with DEVICE_LOCAL on integrated GPUs

radv_report_llvm9_version_string

bool

false

Report LLVM 9.0.1 for games that apply shader workarounds if missing (for ACO only)

radv_prefer_2d_swizzle_for_3d_storage

bool

false

Prefer 2D swizzle mode for 3D storage images.

radv_gfx12_hiz_wa

string

(none)

Choose the specific HiZ workaround to apply on GFX12 (RDNA4). Accepted values are: disabled, partial or full

adaptive_sync

bool

true

Adapt the monitor sync to the application performance (when possible)

vk_x11_override_min_image_count

int

0

0999

Override the VkSurfaceCapabilitiesKHR::minImageCount (0 = no override)

vk_x11_strict_image_count

bool

false

Force the X11 WSI to create exactly the number of image specified by the application in VkSwapchainCreateInfoKHR::minImageCount

vk_x11_ensure_min_image_count

bool

false

Force the X11 WSI to create at least the number of image specified by the driver in VkSurfaceCapabilitiesKHR::minImageCount

vk_xwayland_wait_ready

bool

false

Wait for fences before submitting buffers to Xwayland

Features

Option

Type

Default

Range / Values

Description

radv_device_coherent_memory

bool

false

Expose VK_AMD_device_coherent_memory on GFX12 (RDNA4).

radv_cooperative_matrix2_nv

bool

false

Expose VK_NV_cooperative_matrix2 on supported hardware.

radv_emulate_rt

bool

false

Expose RT extensions on GFX10 and below through software emulation.

radv_enable_float16_gfx8

bool

false

Expose float16 on GFX8, where it’s supported but usually not beneficial.

vk_require_etc2

bool

false

Implement emulated ETC2 on HW that does not support it

vk_require_astc

bool

false

Implement emulated ASTC on HW that does not support it

Miscellaneous

Option

Type

Default

Range / Values

Description

radv_clear_lds

bool

false

Clear LDS at the end of shaders. Might decrease performance.

override_vram_size

int

-1

-12147483647

Override the VRAM size advertised to the application in MiB (-1 = default)

radv_override_graphics_shader_version

int

0

07

Override the shader version of graphics pipelines to force re-compilation. (0 = default)

radv_override_compute_shader_version

int

0

07

Override the shader version of compute pipelines to force re-compilation. (0 = default)

radv_override_ray_tracing_shader_version

int

0

07

Override the shader version of ray tracing pipelines to force re-compilation. (0 = default)

Hardware Documentation

You can find a list of documentation for the various generations of AMD hardware on the X.Org wiki.

Additional community-written documentation is also available in Mesa: