Why another software rasterizer?

Good question, given there are already three (swrast, softpipe, llvmpipe) in the Mesa tree. Two important reasons for this:

  • Architecture - given our focus on scientific visualization, our workloads are much different than the typical game; we have heavy vertex load and relatively simple shaders. In addition, the core counts of machines we run on are much higher. These parameters led to design decisions much different than llvmpipe.

  • Historical - Intel had developed a high performance software graphics stack for internal purposes. Later we adapted this graphics stack for use in visualization and decided to move forward with Mesa to provide a high quality API layer while at the same time benefiting from the excellent performance the software rasterizerizer gives us.

What’s the architecture?

SWR is a tile based immediate mode renderer with a sort-free threading model which is arranged as a ring of queues. Each entry in the ring represents a draw context that contains all of the draw state and work queues. An API thread sets up each draw context and worker threads will execute both the frontend (vertex/geometry processing) and backend (fragment) work as required. The ring allows for backend threads to pull work in order. Large draws are split into chunks to allow vertex processing to happen in parallel, with the backend work pickup preserving draw ordering.

Our pipeline uses just-in-time compiled code for the fetch shader that does vertex attribute gathering and AOS to SOA conversions, the vertex shader and fragment shaders, streamout, and fragment blending. SWR core also supports geometry and compute shaders but we haven’t exposed them through our driver yet. The fetch shader, streamout, and blend is built internally to swr core using LLVM directly, while for the vertex and pixel shaders we reuse bits of llvmpipe from gallium/auxiliary/gallivm to build the kernels, which we wrap differently than llvmpipe’s auxiliary/draw code.

What’s the performance?

For the types of high-geometry workloads we’re interested in, we are significantly faster than llvmpipe. This is to be expected, as llvmpipe only threads the fragment processing and not the geometry frontend. The performance advantage over llvmpipe roughly scales linearly with the number of cores available.

While our current performance is quite good, we know there is more potential in this architecture. When we switched from a prototype OpenGL driver to Mesa we regressed performance severely, some due to interface issues that need tuning, some differences in shader code generation, and some due to conformance and feature additions to the core swr. We are looking to recovering most of this performance back.

What’s the conformance?

The major applications we are targeting are all based on the Visualization Toolkit (VTK), and as such our development efforts have been focused on making sure these work as best as possible. Our current code passes vtk’s rendering tests with their new “OpenGL2” (really OpenGL 3.2) backend at 99%.

piglit testing shows a much lower pass rate, roughly 80% at the time of writing. Core SWR undergoes rigorous unit testing and we are quite confident in the rasterizer, and understand the areas where it currently has issues (example: line rendering is done with triangles, so doesn’t match the strict line rendering rules). The majority of the piglit failures are errors in our driver layer interfacing Mesa and SWR. Fixing these issues is one of our major future development goals.

Why are you open sourcing this?

  • Our customers prefer open source, and allowing them to simply download the Mesa source and enable our driver makes life much easier for them.

  • The internal gallium APIs are not stable, so we’d like our driver to be visible for changes.

  • It’s easier to work with the Mesa community when the source we’re working with can be used as reference.

What are your development plans?

  • Performance - see the performance section earlier for details.

  • Conformance - see the conformance section earlier for details.

  • Features - core SWR has a lot of functionality we have yet to expose through our driver, such as MSAA, geometry shaders, compute shaders, and tesselation.

  • AVX512 support

What is the licensing of the code?

  • All code is under the normal Mesa MIT license.

Will this work on AMD?

  • If using an AMD processor with AVX or AVX2, it should work though we don’t have that hardware around to test. Patches if needed would be welcome.

Will this work on ARM, MIPS, POWER, <other non-x86 architecture>?

  • Not without a lot of work. We make extensive use of AVX and AVX2 intrinsics in our code and the in-tree JIT creation. It is not the intention for this codebase to support non-x86 architectures.

What hardware do I need?

  • Any x86 processor with at least AVX (introduced in the Intel SandyBridge and AMD Bulldozer microarchitectures in 2011) will work.

  • You don’t need a fire-breathing Xeon machine to work on SWR - we do day-to-day development with laptops and desktop CPUs.

Does one build work on both AVX and AVX2?

Yes. The build system creates two shared libraries, libswrAVX.so and libswrAVX2.so, and swr_create_screen() loads the appropriate one at runtime.