Project update 7 of 15
The NXP i.MX8MQ system-on-chip used in Reform has an external display output that supports HDMI 2.0a (if you can tolerate a piece of binary firmware loaded into the HDMI controller). The maximum resolution is 4096 x 2160 (4K) at 60 Hz. While we still lack a display capable of this resolution in the lab, I connected Reform to the Ultra HD TV at home to at least validate that it could output to the 3840 x 2160 @ 30 Hz maximum of that TV, and it was no problem at all:
MNT Reform connected to 4K TV
We’ll obtain a display that can accept 4K @ 60 Hz over HDMI and report back. Needless to say, this resolution is great for working with lots of text or terminals on a big screen. The Hantro H.264 hardware decoder (now supported by the Linux kernel) can decode 4K video in realtime, but we still have to validate this. While the built-in GC7000L GPU has significantly more work to do to render to 4K compared to 1080p, it is possible to clock it up to 1 GHz to squeeze out some more performance.
For at least the last half year I’ve been unhappy about not being able to use KiCAD on Reform, the free and open source electronics design program that I used to create the circuit boards that make up Reform. While KiCAD compiles and runs on Reform, it wasn’t possible to enable the "Accelerated Graphics" mode that leverages the GPU for schematics and PCB rendering. With software-based rendering, KiCAD performs very slowly on Reform/ARM64 systems, especially for complex boards, so that wasn’t an option for me.
There were three roadblocks in the way to running KiCAD on Reform, two of which affected some other applications/games as well:
KiCAD implements an "overlay", which is a third OpenGL framebuffer (in addition to background and foreground graphics) for interactive operations like drawing new traces or artwork on top of existing, cached graphics. The etnaviv open source driver for GC7000L doesn’t support this feature, so I came up with a workaround patch for KiCAD that doesn’t use an overlay but renders to the foreground framebuffer instead. This cleared the way to being able to turn Accelerated Graphics mode on.
GC7000L has a new architecture internally called "HALTI 5" which introduced some differences to older generations supported by the etnaviv drivers, so some GPU features behave differently than expected or are activated by unknown registers or bit positions that have to be reverse engineered-again. One such feature is disabling "Early-Z Reject". To figure out the correct 3D order of the pixels the GPU has to paint every frame, it uses a so-called Depth Buffer to record the Z (depth) position in 3D space of every pixel it has rendered so far. When another pixel is scheduled to be painted on top, its Z position is first compared to what value is already in the Depth Buffer at that X/Y coordinate. If there is already something that is logically in front of what we want to paint, we don’t paint over it. This way, it doesn’t matter in which order the objects (triangles) are painted. The Depth Buffer will make sure that pixels closer to the camera obscure pixels that are further away.
There is an optimization in modern GPUs called "Early-Z Reject" that sorts out the Depth Buffer before running all the expensive shaders that determine the actual texture and color of the pixels. Triangles that are determined to be fully obscured can be skipped altogether, saving rendering time.
A problem with this approach appears when using a shader function called "discard". The discard (sometimes called TEXKILL) instruction can be used to poke transparent holes into the currently drawn texture/triangle, so that the background would shine through instead. But this can only work if objects behind the current object have been painted, or something wrong will show up instead. The GPU drivers have to detect that "discard" is being used and disable the Early-Z optimization for the scene. In the case of etnaviv driving the GC7000L, this did not work.
This affected the rendering of KiCAD’s traces, text and zones, which is best explained with a picture:
Screenshot of KiCAD showing Depth Bug
After much frustration, I decided to dig into Mesa’s etnaviv driver source code to see if I could figure out how to disable Early-Z rejection in GC7000L myself. Christian Gmeiner and Marek Vasut, both etnaviv contributors, helped me - each providing puzzle pieces of the toolset required for reverse engineering the GPU. In the end, I was able to find the GPU bits that need to be toggled to disable Early-Z rejection and fix all KiCAD rendering problems.
Here’s a quick walkthrough for anyone wanting to do more etnaviv (very welcome!) reverse engineering.
The main strategy of figuring out the correct way of doing things with Vivante GPUs is to watch what the proprietary blob, "GALcore", would do, and compare that to the operations etnaviv does. The difference between these behaviours often contains the key to unknown bits in registers and their meanings.
First, try to isolate the behaviour that you want to analyze and boil it down to a minimal test case. I did this when originally reporting the bug. You can find the test case sources on GitHub.
To obtain the command stream trace of my test case from the blob, I did the following:
modprobe vivanteto insert the proprietary kernel module.
imx_v6.2.4p1.8by copying the header files from
gc_abi.hfrom the etnaviv project (it's in all the other
imx_...folders_). I could then
export GCABI=imx_v6.2.4p1.8and was good to build the library
LD_PRELOAD=/path/to/viv_interpose.so ETNAVIV_FDR=/tmp/trace.fdr ./test_case
/tmp/trace.fdrcontains the command stream trace in binary form. Copy this over to your workstation.
./tools/dump_cmdstream.py trace.fdr ./data/gcs_hal_interface_imx_v6.2.4.p1.8.json >blob_dump.txt
./tools/build_json.shin the etna_viv repository. Calling
./build_json.sh gcabi imx_v6.2.4p1.8created the JSON files for me.
Quite a lot of work to set everything up, but once you have it, you can start feeding test cases to GALcore and analyze them.
With a similar, but slightly less complicated process, you can trace the command streams of etnaviv (the open source driver):
dumpbranch of mesa by Christian Gmeiner with the meson option
-Dtools=etnavivand install it on your etnaviv-powered system, changing the hardcoded dump path before.
_cmdstreamand can be converted to text form like this (with another tool from the etna_viv repository):
./tools/dump_separate_cmdbuf.py -b submit_00000003_cmdstream >decoded_cmdstream.txt
Armed with these text files, you can compare the commands and values that etnaviv sends to the GPU versus the ones that the GALcore blob sends. My breakthrough however came after comparing two traces from the blob, one with depth testing enabled vs depth testing disabled. I noticed that the blob would toggle not one, but three bits to disable Early-Z rejection, spread across two registers. One function had to be turned off while another function hat to be turned on. You can see my work-in-progress patch here to see which ones.
There is still no automatic way in etnaviv to recognize "discard" in shaders to trigger turning of Early-Z rejection, but it can be set with an environment variable,
ETNA_MESA_DEBUG. Normally, I set this to
nir to enable the NIR shader compiler, but now I set it to
nir,no_early_z to disable Early-Z rejection as well. I’ll continue work to make this switch happen automatically.
For now, this patch completely fixes KiCAD accelerated rendering on MNT Reform/i.MX8MQ:
Screenshot of KiCAD on MNT Reform, fixed
Screenshot of KiCAD on MNT Reform, fixed
As a bonus, this fixes some games and emulators as well, including (with a tiny shader patch) the rendering of plants and transparent objects one of my favorites, Minetest, an open source voxel game engine with some fantastic mods such as a multiplayer Minecraft-like world.
Screenshot of Minetest on MNT Reform, fixed
There was one more rendering issue plaguing the toolbars in KiCAD and GUI elements of applications using legacy X11 toolkits through Xwayland (the X compatibility layer for wayland compositors). This resulted in elements sometimes not being fully drawn and flickering in and out of existence. After some discussion with Daniel Stone, I decided to hunt for a patch for this problem as well. After a few days, I was lucky: placing a
glFinish() in a strategic location at the end of the glamor_composite_clipped_region() function in the X server code mitigates the problem in almost all cases. This also fixes the pre-GTK3 version of the GIMP running on Xwayland. GTK3, Qt and SDL applications are immune because they bring their own rendering and don’t rely on the X server’s drawing functions.
In my opinion, shipping hardware is also about shipping working software. That’s why I try to catch as many problems in important applications as I can before shipping Reform. I also wanted to detail my approach to fixing certain problems directly on the system, because this has been a great and rewarding learning experience for me, and it can be for you, too. This can be intimidating and frustrating at first, but with every subsystem that you manage to take apart and solve a problem in, you gain a more intimate understanding of the hardware and software you rely on every day. And you can learn valuable programming and engineering skills on the way. This is a big part of what the MNT Reform project is all about.