Speeding up open-source GPU driver development with unit tests, drm-shim, and code reuse

Getting an Arm platform that works with mainline Linux may take several years as the work is often done by third parties, and the silicon vendor has its own Linux tree. That means in many cases, the software is ready when the platform is obsolete or soon will be. It would be nice to start software development before the hardware is ready. It may seem like a crazy idea, but thats what the team at Collabora has done to add support for Arm Valhall GPUs (Mali-G57, Mali-G78) to the Panfrost open-source GPU driver.

The result is that it only took the team a few days to successfully pass tests using data structures prepared by their Mesa driver and shaders compiled by their Valhall compiler after receiving the actual hardware thanks to the work done in the last six months. So how did they achieve this feat exactly?

We have to go back in time by a few months first. Last July, Collabora announced they had reverse-engineered Mali-G78 GPUs Valhall instruction set using a Samsung Galaxy S21 smartphone. Wait? Didnt I just say they work without Mali-G78 hardware? Correct, but they could not install mainline Linux and their GPU driver on the device as it was not rooted. They just used it to reverse-engineer the instructions and perform some testing by modifying compiled shaders and GPU data structures to experiment with individual bits. That step could have been avoided if Mali G78 documentation was available.

Alyssa Rosenzweig, a graphics software engineer for Collabora, continued her software development work, and in November 2021. she had written a Valhall compiler, and reverse-engineered enough to write a driver but still had no Linux hardware to test the code. So she wrote unit tests for everything from instruction packing to optimization and managed to solve a few bugs in the process simply using her development machine running Linux.

The next step was to use drm-shim library with fake GEM kernel drivers in userspace for CI (continuous integration) used in the Mesa project. A drm-shim driver makes the system think it features an actual GPU, but does nothing apart from receiving systems calls from userspace graphics drivers. This is not an emulator, and can not be used to test functionality, but it can help find flaws in the program flow. She was able to run a large number of tests on Apple M1 running Linux after fixing a bug (Hint: page size is 16K, instead of 4K) including compiling thousands of shaders per second with the Valhall compiler, and running Khronoss OpenGL ES Conformance Test Suite to identify any issues.

Collabora also attempted to identify differences between Valhall and earlier Arm Mali GPUs such as Bifrost, and reuse a large part of Panfrost driver code, and only change the part of the code where they detected differences. For instance, the Valhall instruction set is quite similar to the older Bifrost instruction set, so embedded the Valhall compiler as an additional backend in the existing Bifrost compiler. Alyssa explains:

Shared compiler passes like instruction selection and register allocation just work on Valhall, even though they were developed and debugged for Bifrost.

Earlier this month (January 2022), Collabora finally received a Chromebook with a MediaTek MT8192 (Kompanio 820) system-on-chip (with Mali-G57 MC5 GPU) and a serial cable, they managed to run mainline Linux on the board after fixing USB, although the display is not working yet. The GPU is automatically disabled in MT8192 apparently due to a silicon bug but can be enabled after disabling the Accelerator Coherency Port (ACP). As discussed above it then only took a few days to successfully pass hundreds of tests on the actual hardware thanks to their preparation work. Collabora now expects Panfrost to support Valhall GPU in time for end-users. You can read the full story on Collabora blog.

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

Here is the original post:

Speeding up open-source GPU driver development with unit tests, drm-shim, and code reuse - CNX Software