Auxiliary surface compression

Most lossless image compression on Intel hardware, be that CCS, MCS, or HiZ, works by way of some chunk of auxiliary data (often a surface) which is used together with the main surface to provide compression. Even though this means more memory is allocated, the scheme allows us to reduce our over-all memory bandwidth since the auxiliary data is much smaller than the main surface.

The simplest example of this is single-sample fast clears (isl_aux_usage.ISL_AUX_USAGE_CCS_D) on Ivy Bridge through Broadwell and later. For this scheme, the auxiliary surface stores a single bit for each cache-line-pair in the main surface. If that bit is set, then the entire cache line pair contains only the clear color as provided in the RENDER_SURFACE_STATE for the image. If the bit is unset, then it’s not clear and you should look at the main surface. Since a cache line is 64B, this yields a scale-down factor of 1:1024.

Even the simple fast-clear scheme saves us bandwidth in two places. The first is when we go to clear the surface. If we’re doing a full-surface clear or clearing to the same color that was used to clear before, we don’t have to touch the main surface at all. All we have to do is record the clear color and smash the aux data to 0xff. The hardware then knows to ignore whatever is in the main surface and look at the clear color instead. The second is when we go to render. Say we’re doing some color blending. Instead of the blend unit having to read back actual surface contents to blend with, it looks at the clear bit and blends with the clear color recorded with the surface state instead. Depending on the geometry and cache utilization, this can save as much as one whole read of the surface worth of bandwidth.

The difficulty with a scheme like this comes when we want to do something else with that surface. What happens if the sampler doesn’t support this fast-clear scheme (it doesn’t on IVB)? In that case, we have to do a resolve where we run a special pipeline that reads the auxiliary data and applies it to the main surface. In the case of fast clears, this means that, for every 1 bit in the auxiliary surface, the corresponding pair of cache lines in the main surface gets filled with the clear color. At the end of the resolve operation, the main surface contents are the actual contents of the surface.

Types of surface compression

Intel hardware has several different compression schemes that all work along similar lines:

enum isl_aux_usage

Enumerates the different forms of auxiliary surface compression

enumerator ISL_AUX_USAGE_NONE

No Auxiliary surface is used

enumerator ISL_AUX_USAGE_HIZ

Hierarchical depth compression

First introduced on Iron Lake, this compression scheme compresses depth surfaces by storing alternate forms of the depth value in a HiZ surface. Possible (not all) compressed forms include:

  • An uncompressed “look at the main surface” value

  • A special value indicating that the main surface data should be ignored and considered to contain the clear value.

  • The depth for the entire main-surface block as a plane equation

  • The minimum/maximum depth for the main-surface block

This second one isn’t helpful for getting exact depth values but can still substantially accelerate depth testing if the specified range is sufficiently small.

enumerator ISL_AUX_USAGE_MCS

Multisampled color compression

Introduced on Ivy Bridge, this compression scheme compresses multisampled color surfaces by storing a mapping from samples to planes in the MCS surface, allowing for de-duplication of identical samples. The MCS value of all 1’s is reserved to indicate that the pixel contains the clear color. Exact details about the data stored in the MCS and how it maps samples to slices is documented in the PRMs.

Invariant:

isl_surf.samples > 1

enumerator ISL_AUX_USAGE_CCS_D

Single-sampled fast-clear-only color compression

Introduced on Ivy Bridge, this compression scheme compresses single-sampled color surfaces by storing a bit for each cache line pair in the main surface in the CCS which indicates that the corresponding pair of cache lines in the main surface only contains the clear color. On Skylake, this is increased to two bits per cache line pair with 0x0 meaning resolved and 0x3 meaning clear.

Invariant:

The surface is a color surface

Invariant:

isl_surf.samples == 1

enumerator ISL_AUX_USAGE_CCS_E

Single-sample lossless color compression

Introduced on Skylake, this compression scheme compresses single-sampled color surfaces by storing a 2-bit value for each cache line pair in the main surface which says how the corresponding pair of cache lines in the main surface are to be interpreted. Valid CCS values include:

  • 0x0: Indicates that the corresponding pair of cache lines in the main surface contain valid color data

  • 0x1: Indicates that the corresponding pair of cache lines in the main surface contain compressed color data. Typically, the compressed data fits in one of the two cache lines.

  • 0x3: Indicates that the corresponding pair of cache lines in the main surface should be ignored. Those cache lines should be considered to contain the clear color.

Starting with Tigerlake, each CCS value is 4 bits per cache line pair in the main surface.

Invariant:

The surface is a color surface

Invariant:

isl_surf.samples == 1

enumerator ISL_AUX_USAGE_FCV_CCS_E

Single-sample lossless color compression with fast clear optimization

Introduced on Tigerlake, this is identical to ISL_AUX_USAGE_CCS_E except it also encodes a feature about regular render writes possibly fast-clearing blocks in the surface. In the Alchemist docs, the name of the feature is easier to find. In the 3DSTATE_3D_MODE packet, it is referred to as “Fast Clear Optimization (FCV)”.

Invariant:

The surface is a color surface

Invariant:

isl_surf.samples == 1

enumerator ISL_AUX_USAGE_MC

Media color compression

Used by the media engine on Tigerlake and above. This compression form is typically not produced by 3D drivers but they need to be able to consume it in order to get end-to-end compression when the image comes from media decode.

Invariant:

The surface is a color surface

Invariant:

isl_surf.samples == 1

enumerator ISL_AUX_USAGE_HIZ_CCS_WT

Combined HiZ+CCS in write-through mode

In this mode, introduced on Tigerlake, the HiZ and CCS surfaces act as a single fused compression surface where resolves (but not ambiguates) operate on both surfaces at the same time. In this mode, the HiZ surface operates in write-through mode where it is only used for accelerating depth testing and not for actual compression. The CCS-compressed surface contains valid data at all times.

Invariant:

The surface is a color surface

Invariant:

isl_surf.samples == 1

enumerator ISL_AUX_USAGE_HIZ_CCS

Combined HiZ+CCS without write-through

In this mode, introduced on Tigerlake, the HiZ and CCS surfaces act as a single fused compression surface where resolves (but not ambiguates) operate on both surfaces at the same time. In this mode, full HiZ compression is enabled and the CCS-compressed main surface may not contain valid data. The only way to read the surface outside of the depth hardware is to do a full resolve which resolves both HiZ and CCS so the surface is in the pass-through state.

Invariant:

The surface is a depth surface

enumerator ISL_AUX_USAGE_MCS_CCS

Combined MCS+CCS without write-through

In this mode, introduced on Tigerlake, we have fused MCS+CCS compression where the MCS is used for fast-clears and “identical samples” compression just like on Gfx7-11 but each plane is then CCS compressed.

Invariant:

The surface is a depth surface

Invariant:

isl_surf.samples > 1

enumerator ISL_AUX_USAGE_STC_CCS

Stencil compression

Introduced on Tigerlake, this is similar to CCS_E only used to compress stencil surfaces.

Invariant:

The surface is a stencil surface

Invariant:

isl_surf.samples == 1

bool isl_aux_usage_has_fast_clears(enum isl_aux_usage usage)
bool isl_aux_usage_has_compression(enum isl_aux_usage usage)
static inline bool isl_aux_usage_has_hiz(enum isl_aux_usage usage)
static inline bool isl_aux_usage_has_mcs(enum isl_aux_usage usage)
static inline bool isl_aux_usage_has_ccs(enum isl_aux_usage usage)

Creating auxiliary surfaces

Each type of data compression requires some type of auxiliary data on the side. For most, this involves a second auxiliary surface. ISL provides helpers for creating each of these types of surfaces:

bool isl_surf_get_hiz_surf(const struct isl_device *dev, const struct isl_surf *surf, struct isl_surf *hiz_surf)

Constructs a HiZ surface for the given main surface.

Parameters:
  • surf[in] The main surface

  • hiz_surf[out] The HiZ surface to populate on success

Returns:

false if the main surface cannot support HiZ.

bool isl_surf_get_mcs_surf(const struct isl_device *dev, const struct isl_surf *surf, struct isl_surf *mcs_surf)

Constructs a MCS for the given main surface.

Parameters:
  • surf[in] The main surface

  • mcs_surf[out] The MCS to populate on success

Returns:

false if the main surface cannot support MCS.

bool isl_surf_supports_ccs(const struct isl_device *dev, const struct isl_surf *surf, const struct isl_surf *hiz_or_mcs_surf)
Parameters:
  • surf[in] The main surface

  • hiz_or_mcs_surf[in] HiZ or MCS surface associated with the main surface

Returns:

true if the given surface supports CCS.

bool isl_surf_get_ccs_surf(const struct isl_device *dev, const struct isl_surf *surf, const struct isl_surf *hiz_or_mcs_surf, struct isl_surf *ccs_surf, uint32_t row_pitch_B)

Constructs a CCS for the given main surface.

Note

Starting with Tigerlake, the CCS is no longer really a surface. It’s not laid out as an independent surface and isn’t referenced by RENDER_SURFACE_STATE::”Auxiliary Surface Base Address” like other auxiliary compression surfaces. It’s a blob of memory that’s a 1:256 scale-down from the main surfaced that’s attached side-band via a second set of page tables.

In spite of this, it’s sometimes useful to think of it as being a linear buffer-like surface, at least for the purposes of allocation. When invoked on Tigerlake or later, this function still works and produces such a linear surface.

Parameters:
  • surf[in] The main surface

  • hiz_or_mcs_surf[in] HiZ or MCS surface associated with the main surface

  • ccs_surf[out] The CCS to populate on success

  • row_pitch_B – The row pitch for the CCS in bytes or 0 if ISL should calculate the row pitch.

Returns:

false if the main surface cannot support CCS.

Compression state tracking

All of the Intel auxiliary surface compression schemes share a common concept of a main surface which may or may not contain correct up-to-date data and some auxiliary data which says how to interpret it. The main surface is divided into blocks of some fixed size and some smaller block in the auxiliary data controls how that main surface block is to be interpreted. We then have to do resolves depending on the different HW units which need to interact with a given surface.

To help drivers keep track of what all is going on and when resolves need to be inserted, ISL provides a finite state machine which tracks the current state of the main surface and auxiliary data and their relationship to each other. The states are encoded with the isl_aux_state enum. ISL also provides helper functions for operating the state machine and determining what aux op (if any) is required to get to the right state for a given operation.

enum isl_aux_state

Enum for keeping track of the state an auxiliary compressed surface.

For any given auxiliary surface compression format (HiZ, CCS, or MCS), any given slice (lod + array layer) can be in one of the seven states described by this enum. Drawing with or without aux enabled may implicitly cause the surface to transition between these states. There are also four types of auxiliary compression operations which cause an explicit transition which are described by the isl_aux_op enum below.

Not all operations are valid or useful in all states. The diagram below contains a complete description of the states and all valid and useful transitions except clear.

Draw w/ Aux
+----------+
|          |
|       +-------------+    Draw w/ Aux     +-------------+
+------>| Compressed  |<-------------------|    Clear    |
        |  w/ Clear   |----->----+         |             |
        +-------------+          |         +-------------+
               |  /|\            |            |   |
               |   |             |            |   |
               |   |             +------<-----+   |  Draw w/
               |   |             |                | Clear Only
               |   |      Full   |                |   +----------+
       Partial |   |     Resolve |               \|/  |          |
       Resolve |   |             |         +-------------+       |
               |   |             |         |   Partial   |<------+
               |   |             |         |    Clear    |<----------+
               |   |             |         +-------------+           |
               |   |             |                |                  |
               |   |             +------>---------+  Full            |
               |   |                              | Resolve          |
Draw w/ aux    |   |   Partial Fast Clear         |                  |
+----------+   |   +--------------------------+   |                  |
|          |  \|/                             |  \|/                 |
|       +-------------+    Full Resolve    +-------------+           |
+------>| Compressed  |------------------->|  Resolved   |           |
        |  w/o Clear  |<-------------------|             |           |
        +-------------+    Draw w/ Aux     +-------------+           |
              /|\                             |   |                  |
               |  Draw                        |   |  Draw            |
               | w/ Aux                       |   | w/o Aux          |
               |            Ambiguate         |   |                  |
               |   +--------------------------+   |                  |
Draw w/o Aux   |   |                              |   Draw w/o Aux   |
+----------+   |   |                              |   +----------+   |
|          |   |  \|/                            \|/  |          |   |
|       +-------------+     Ambiguate      +-------------+       |   |
+------>|    Pass-    |<-------------------|     Aux     |<------+   |
+------>|   through   |                    |   Invalid   |           |
|       +-------------+                    +-------------+           |
|          |   |                                                     |
+----------+   +-----------------------------------------------------+
  Draw w/                       Partial Fast Clear
 Clear Only

While the above general theory applies to all forms of auxiliary compression on Intel hardware, not all states and operations are available on all compression types. However, each of the auxiliary states and operations can be fairly easily mapped onto the above diagram:

HiZ: Hierarchical depth compression is capable of being in any of the states above. Hardware provides three HiZ operations: “Depth Clear”, “Depth Resolve”, and “HiZ Resolve” which map to “Fast Clear”, “Full Resolve”, and “Ambiguate” respectively. The hardware provides no HiZ partial resolve operation so the only way to get into the “Compressed w/o Clear” state is to render with HiZ when the surface is in the resolved or pass-through states.

MCS: Multisample compression is technically capable of being in any of the states above except that most of them aren’t useful. Both the render engine and the sampler support MCS compression and, apart from clear color, MCS is format-unaware so we leave the surface compressed 100% of the time. The hardware provides no MCS operations.

CCS_D: Single-sample fast-clears (also called CCS_D in ISL) are one of the simplest forms of compression since they don’t do anything beyond clear color tracking. They really only support three of the six states: Clear, Partial Clear, and Pass-through. The only CCS_D operation is “Resolve” which maps to a full resolve followed by an ambiguate.

CCS_E: Single-sample render target compression (also called CCS_E in ISL) is capable of being in almost all of the above states. THe only exception is that it does not have separate resolved and pass- through states. Instead, the CCS_E full resolve operation does both a resolve and an ambiguate so it goes directly into the pass-through state. CCS_E also provides fast clear and partial resolve operations which work as described above.

Note

The state machine above isn’t quite correct for CCS on TGL. There is a HW bug (or feature, depending on who you ask) which can cause blocks to enter the fast-clear state as a side-effect of a regular draw call. This means that a draw in the resolved or compressed without clear states takes you to the compressed with clear state, not the compressed without clear state.

static inline bool isl_aux_state_has_valid_primary(enum isl_aux_state state)
static inline bool isl_aux_state_has_valid_aux(enum isl_aux_state state)
enum isl_aux_op

Enum describing explicit aux transition operations

These operations are used to transition from one isl_aux_state to another. Even though a draw does transition the state machine, it’s not included in this enum as it’s something of a special case.

enum isl_aux_op isl_aux_prepare_access(enum isl_aux_state initial_state, enum isl_aux_usage usage, bool fast_clear_supported)

Return an isl_aux_op needed to enable an access to occur in an isl_aux_state suitable for the isl_aux_usage.

Note

If the access will invalidate the main surface, this function should not be called and the isl_aux_op of NONE should be used instead. Otherwise, an extra (but still lossless) ambiguate may occur.

Invariant:

initial_state is possible with an isl_aux_usage compatible with the given usage. Two usages are compatible if it’s possible to switch between them (e.g. CCS_E <-> CCS_D).

Invariant:

fast_clear is false if the aux doesn’t support fast clears.

enum isl_aux_state isl_aux_state_transition_aux_op(enum isl_aux_state initial_state, enum isl_aux_usage usage, enum isl_aux_op op)

Return the isl_aux_state entered after performing an isl_aux_op.

Invariant:

initial_state is possible with the given usage.

Invariant:

op is possible with the given usage.

Invariant:

op must not cause HW to read from an invalid aux.

enum isl_aux_state isl_aux_state_transition_write(enum isl_aux_state initial_state, enum isl_aux_usage usage, bool full_surface)

Return the isl_aux_state entered after performing a write.

Note

full_surface should be true if the write covers the entire slice. Setting it to false in this case will still result in a correct (but imprecise) aux state.

Invariant:

if usage is not ISL_AUX_USAGE_NONE, then initial_state is possible with the given usage.

Invariant:

usage can be ISL_AUX_USAGE_NONE iff: * the main surface is valid, or * the main surface is being invalidated/replaced.