LAVA CI

LAVA is a system for functional testing of boards including deploying custom bootloaders and kernels. This is particularly relevant to testing Mesa because we often need to change kernels for UAPI changes (and this lets us do full testing of a new kernel during development), and our workloads can easily take down boards when mistakes are made (kernel oopses, OOMs that take out critical system services).

Available LAVA labs

  • Collabora [dashboard] (without authentication only health check jobs are displayed)

  • Lima [dashboard not available]

Mesa-LAVA software architecture

The gitlab-runner will run on some host that has access to the LAVA lab, with tags like “mesa-ci-x86-64-lava-$DEVICE_TYPE” to control only taking in jobs for the hardware that the LAVA lab contains. The gitlab-runner spawns a Docker container with lavacli in it, and connects to the LAVA lab using a predefined token to submit jobs under a specific device type.

The LAVA instance manages scheduling those jobs to the boards present. For a job, it will deploy the kernel, device tree, and the ramdisk containing the CTS.

Deploying a new Mesa-LAVA lab

You’ll want to start with setting up your LAVA instance and getting some boards booting using test jobs. Start with the stock QEMU examples to make sure your instance works at all. Then, you’ll need to define your actual boards.

The device type in lava-gitlab-ci.yml is the device type you create in your LAVA instance, which doesn’t have to match the board’s name in /etc/lava-dispatcher/device-types. You create your boards under that device type and the Mesa jobs will be scheduled to any of them. Instantiate your boards by creating them in the UI or at the command line attached to that device type, then populate their dictionary (using an “extends” line probably referencing the board’s template in /etc/lava-dispatcher/device-types). Now, go find a relevant health check job for your board as a test job definition, or cobble something together from a board that boots using the same boot_method and some public images, and figure out how to get your boards booting.

Once you can boot your board using a custom job definition, it’s time to connect Mesa CI to it. Install gitlab-runner and register as a shared runner (you’ll need a GitLab admin for help with this). The runner must have a tag (like “mesa-ci-x86-64-lava-rk3399-gru-kevin”) to restrict the jobs it takes or it will grab random jobs from tasks across gitlab.freedesktop.org, and your runner isn’t ready for that.

The Docker image will need access to the LAVA instance. If it’s on a public network it should be fine. If you’re running the LAVA instance on localhost, you’ll need to set network_mode="host" in /etc/gitlab-runner/config.toml so it can access localhost. Create a gitlab-runner user in your LAVA instance, log in under that user on the web interface, and create an API token. Copy that into a lavacli.yaml:

default:
   token: <token contents>
   uri: <URL to the instance>
   username: gitlab-runner

Add a volume mount of that lavacli.yaml to /etc/gitlab-runner/config.toml so that the Docker container can access it. You probably have a volumes = ["/cache"] already, so now it would be:

volumes = ["/home/anholt/lava-config/lavacli.yaml:/root/.config/lavacli.yaml", "/cache"]

Note that this token is visible to anybody that can submit MRs to Mesa! It is not an actual secret. We could just bake it into the GitLab CI YAML, but this way the current method of connecting to the LAVA instance is separated from the Mesa branches (particularly relevant as we have many stable branches all using CI).

Now it’s time to define your test jobs in the driver-specific gitlab-ci.yml file, using the device-specific tags.

Caching downloads

To improve the runtime for downloading traces during traces job runs, you will want a pass-through HTTP cache. On your runner box, install nginx:

sudo apt install nginx libnginx-mod-http-lua

Add the server setup files:

/etc/nginx/sites-available/fdo-cache
proxy_cache_path /var/cache/nginx/ levels=1:2 keys_zone=my_cache:10m max_size=50g inactive=2w use_temp_path=off;

server {
	listen 10.42.0.1:80 default_server;
	listen 127.0.0.1:80 default_server;
	listen [::]:80 default_server;
	resolver 8.8.8.8;

	root /var/www/html;

	# Add index.php to the list if you are using PHP
	index index.html index.htm index.nginx-debian.html;

	server_name _;

	location / {
		# First attempt to serve request as file, then
		# as directory, then fall back to displaying a 404.
		try_files $uri $uri/ =404;
	}

	location /tmp {
		# Lava server http artifacts to the clients; e.g. for the deploy action
		alias /var/lib/lava/dispatcher/tmp;
	}

	proxy_cache my_cache;

	# Wait for the cache creation when multiple query are done for the same file
	proxy_cache_lock on;
	proxy_cache_lock_age 30m;
	proxy_cache_lock_timeout 1h;

	location /force_cache {
		internal;
		# On some setups the cache headers will indicate to nginx that the
		# artifacts shouldn't be cached, however if we know that that is not valid
		# for lava usage this endpoint allows caching to be forced instead
		proxy_cache_valid 200 48h;
		proxy_ignore_headers Cache-Control Set-Cookie expires;
		include snippets/uri-caching.conf;
	}

	location /fdo_cache {
		internal;
		# As the auth information in the query is being dropped, use
		# the minimal possible cache validity, such that in practise
		# every requests gets revalidated. This avoids
		# unauthenticated downloads from our cache as the cache key doesn't
		# include auth info
		proxy_cache_valid 200 1s;
		proxy_cache_revalidate on;
		proxy_ignore_headers Cache-Control Set-Cookie expires;
		set_by_lua_block $cache_key {
			-- Set the cache key to the uri with the query stripped
			local unescaped =  ngx.unescape_uri(ngx.var.arg_uri);
			local it,err = ngx.re.match(unescaped, "([^?]*).*")
			if not it then
				-- Fallback on the full uri as key if the regexp fails
				return ngx.var.arg_uri;
			end
			return it[1]
		}
		proxy_cache_key $cache_key;
		include snippets/uri-caching.conf;
	}

	location /cache {
		# Gitlabs http server puts everything as no-cache even though
		# the artifacts URLS don't change.
		if ($arg_uri ~*  /.*gitlab.*artifacts(\/|%2F)raw/ ) {
			rewrite ^ /force_cache;
		}

		# fd.o's object storage has an embedded signature for
		# authentication as part of its query. So use an adjusted cache key
		# without the query
		if ($arg_uri ~*  .*your-objectstorage.com(\/|%2F)fdo-opa(\/|%2F)) {
			rewrite ^ /fdo_cache;
		}

		# Set a really low validity together with cache revalidation; Our goal
		# for caching isn't to lower the number of http requests but to
		# lower the amount of data transfer. Also for some test
		# scenarios (typical manual tests) the file at a given url
		# might get modified so avoid confusion by ensuring
		# revalidations happens often.
		proxy_cache_valid 200 10s;
		proxy_cache_revalidate on;
		include snippets/uri-caching.conf;
	}
}
/etc/nginx/snippets/uri-caching.conf
set $proxy_authorization '';

set_by_lua $proxyuri '
	local unescaped =  ngx.unescape_uri(ngx.var.arg_uri);
	local it, err = ngx.re.match(unescaped, "(https?://)(.*@)?([^/]*)(/.*)?");
	if not it then
		-- Hack to cause nginx to return 404
		return "http://localhost/404"
	end

	local scheme = it[1];
	local authstring = it[2];
	local host = it[3];
	local query = it[4];

	if ngx.var.http_authorization and ngx.var.http_authorization ~= "" then
		ngx.var.proxy_authorization = ngx.var.http_authorization;
	elseif authstring then
		auth = string.sub(authstring, 0, -2);
		auth64 = ngx.encode_base64(auth);
		ngx.var.proxy_authorization = "Basic " .. auth64;
	end

	-- Default to / if none is set to avoid using the request_uri query
	if not query then
		query = "/";
	end

	return scheme .. host .. query;
';

# Rewrite the location header to redirect back to this server. Do
# this using lua header filtering to allow for url encoding the original
# location header for use as a query parameter.
header_filter_by_lua_block {
	if ngx.header.location then
		ngx.header.location = "/cache?uri=" .. ngx.escape_uri(ngx.header.location);
	end
}

add_header X-GG-Cache-Status $upstream_cache_status;
proxy_set_header Authorization $proxy_authorization;

proxy_pass $proxyuri;

Edit the listener addresses in fdo-cache to suit the ethernet interface that your devices are on.

Enable the site and restart nginx:

sudo rm /etc/nginx/sites-enabled/default
sudo ln -s /etc/nginx/sites-available/fdo-cache /etc/nginx/sites-enabled/fdo-cache
sudo systemctl restart nginx

# First download will hit the internet
wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo-v2.trace
# Second download should be cached.
wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo-v2.trace

Now, set download-url in your traces-*.yml entry to something like http://caching-proxy/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public and you should have cached downloads for traces. Add it to FDO_HTTP_CACHE_URI= in your config.toml runner environment lines and you can use it for cached artifact downloads instead of going all the way to freedesktop.org on each job.