Hal of Horrors

In the previous post we found the GK7101 SoC’s UART base address and a few registers by decompiling their version of Linux’s decompressor:

  • UART base address is 0xa0005000
  • The input/output register is at offset 0x04
  • Offset 0x14 holds flags:
    • Bit 6 needs to be high before sending.
    • To drain the input buffer, read from offset 0x04 until bit 2 goes low.

This all looks suspiciously like a standard 16550 UART, but not quite: offset 0x04 corresponds to the 16550’s RHR/THR register (Read/Transmit Holding Register), except the 16550 has it on offset 0x00. Offset 0x14 matches the LSR (Line Status Register), with bit 6 matching TEMT (Transmitter Empty), but bit 2 doesn’t quite match – it’s the equivalent of bit 0 in the LSR.

So it’s not a 16550-compatible UART, just a vague attempt at one. Here’s my version of the 4 macros to work this thing:

#define UART0_TH		0x04
#define UART0_LS		0x14
#define UART0_LS_DR		(1 << 2)
#define UART0_LS_TEMT	(1 << 6)

        .macro	addruart, rp, rv, tmp
        ldr	\rp, =CONFIG_DEBUG_UART_PHYS
        ldr	\rv, =CONFIG_DEBUG_UART_VIRT

        .macro	waituart,rd,rx
1001:		ldr	\rd, [\rx, #UART0_LS]
        tst	\rd, #UART0_LS_TEMT
        beq	1001b

        .macro	senduart,rd,rx
        str	\rd, [\rx, #UART0_TH]

        .macro	busyuart,rd,rx
1001:		ldr	\rd, [\rx, #UART0_LS]
        tst	\rd, #UART0_LS_DR
        bne	1001b

To use this low-level console facility, you need to define the following in your kernel’s configuration:


That turns on the facility, enables it in the decompressor, and tells the build system where to find the code. You’re supposed to create that file and define the macros in there. CONFIG_EARLY_PRINTK creates a function called early_print(), which uses these macros as well. Handy to have while you’re debugging your real UART driver.

The last two definitions are used in the addruart macro above, defining the UART base address in physical and virtual (post-MMU) memory. How did I come up with the virtual address? Remember this output from the vendor kernel bootup?

[    0.000000] AHB: 0x90000000  0xf2000000  -- 0x1000000
[    0.000000] APB: 0xa0000000  0xf3000000  -- 0x1000000
[    0.000000] PPM: 0xc0000000  0xc0000000  -- 0x200000
[    0.000000] BSB: 0xc4800000  0xf5000000  -- 0x200000
[    0.000000] DSP: 0xc4a00000  0xf6000000  -- 0x3600000
[    0.000000] hal version = 20151223 

That looks to me like the MMIO mapping: first the physical address, then the virtual address it gets mapped to, followed by the length of the block. APB refers to the Advanced Peripheral Bus, a standard facility on ARM chips. It’s a memory-mapped I/O bus for SoC designers to hook low-speed peripherals to, exactly where you’d expect to find a UART. The virtual address for the UART block within the APB is this 0xf3005000.

That should be all that’s needed to get a mainline kernel’s decompressor output working on this chip.

And yet… it doesn’t work. The decompressor outputs nothing. Diddling those UART registers from the U-Boot shell (which has basic memory monitor commands) also does nothing.

But the vendor kernel didn’t really write to those addresses directly, did it? It used that weird struct with the read and write function pointers, feeding it the addresses we found – and that somehow did work.

This is some sort of abstraction between I/O reads/writes and the actual hardware. In other words, it’s a HAL (Hardware Abstraction Layer). Hardware companies often stick these between their drivers and the operating system they have to run on, in the hope it will save them time and effort. The idea is they can write one driver, and just have a HAL abstract out whether that driver needs to talk to Linux or Windows, for example.

The Linux kernel never accepts drivers that come with a HAL. It makes for an unnecessary internal API – the HAL layer – that nothing else will use, and it creates a lot of unnecessary code. Anyone that wants to mainline code based on a vendor HAL typically gets to rewrite the whole thing, using the vendor code as nothing more than a reference to registers and such. Of course, vendor code even without a HAL also tends to get rewritten from scratch before it goes into mainline, because it’s a pile of dung. But I digress.

Incidentally, not every operating system refuses HAL-based code: the Zephyr project happily accepts such code, for all the wrong reasons. That’s because it’s a project run by a consortium of competitors, managed by the Linux Foundation. In other words there isn’t a clue to be found. This is why we need Linus, folks.

Still, this particular HAL seems unusual – it doesn’t sit between a driver and the OS, but between I/O reads/writes and the registers!

Ghidra didn’t quite manage to extract the address of that struct, and it was quite the struggle to find, but it turns out to live at 0xc0015ad0. U-Boot has a severe misfeature in that asking it to dump memory in 32-bit-sized chunks will make it flip those chunks from little-endian to big-endian. However this comes in handy if you’re on a wrong-endian system and you happen to be human. Here’s the struct:

GK7101 # md.l c0015ad0 20
[PROCESS_SEPARATORS] md.l c0015ad0 20
c0015ad0: c00121a8 c0012904 c0012b80 c0012b0c    .!...)...+...+..
c0015ae0: c00127e8 c0012a90 c0012a14 c001295c    .'...*...*..\)..
c0015af0: c0012250 c0012268 c0012280 c00122a0    P"..h"..."..."..
c0015b00: c00122c0 c00122e0 c0012300 c0012320    ."..."...#.. #..
c0015b10: c0012340 c001235c c0015ad0 00042400    @#..\#...Z...$..

Those are clearly all pointers to the same region, roughly 0xc0012000 - 0xc0015000. We saw the read function at offset 0x10, and write at 0x1c. Let’s take a look at read, at 0xc00127e8. Remember it takes a single 32-bit unsigned integer argument (the address) and returns the value at that address, also uint32_t.

uint hw_readl(uint *address)

    undefined *puVar1;
    uint *puVar2;
    uint uVar3;
    code *pcVar4;
    uint *puVar5;
    undefined4 local_14;
    if (param_1 == DAT_c00128ec) {
        uVar3 = *(uint *)(*DAT_c0012900 + 0x5014);
        return uVar3 & 0xffffffc0 | 8 | (uVar3 & 6) >> 1 | (uVar3 & 1) << 2 | (uVar3 & 0x18) << 1;
    if (param_1 < DAT_c00128ec || param_1 == DAT_c00128ec) {
        if (param_1 != DAT_c00128f0) {
            local_14 = 0;
            puVar5 = param_1;
            puVar1 = FUN_c00127b8((uint)param_1);
            if (puVar1 == NULL) {
                if (((uint)param_1 & 0xff000000) == 0x60000000 ||
                    ((uint)param_1 & 0xff000000) == 0x70000000) {
                    return 0;
                return *param_1;
            puVar2 = (uint *)(**(code **)(puVar1 + 4))(param_1,&local_14);
            if (puVar2 == NULL) {
                return 0;
            pcVar4 = *(code **)(puVar1 + 0x10);
            if (pcVar4 == NULL) {
                return *puVar2;
            uVar3 = (*pcVar4)(local_14,*puVar2,0,pcVar4,puVar5);
            return uVar3;
    else {
        if (param_1 != DAT_c00128f4) {
            if (param_1 == DAT_c00128f8) goto LAB_c0012840;
            goto LAB_c0012874;
    return *(uint *)(*DAT_c00128fc + 0x16000);

This is very much spaghetti code, no doubt at least partly due to the decompiler’s distorted view: the machine code it translates is always spaghetti-shaped. Translated into proper C and then into pseudo-code it does this:

  • If the address is 0xA016018C or 0xF316018C, return 0x00474b37. That’s actually the string “7KG”. Similarly, if the address is 0xA0160188 or 0xF3160188, return 0x00474b37 – the string “1010”. Note the 0xa0000000 and 0xf3000000 address ranges here – so these are definitely related, and this code is apparently meant to work on both physical and virtual addresses.

  • If the address is 0xA0005014 or 0xF3005014 – hey, that’s the UART LSR! – then read from an unknown prefix + 0x5014 and shift the various bits in there around to match the almost-not-quite-16550 we found earlier. The unknown prefix turns out to be set by an earlier call to 0xc0012000, some sort of initialization of this HAL.

  • Any other address, a function at 0xc00127b8 is called with the address, and a number of some sort is returned. That number is used as an index into a jumptable of functions; the indexed function is then called with the address and returns the “real” address. The read is then done on that address, and the result returned.

So all we have to do is figure out which index we get for our 0xa0005000 registers, decompile the function that handles this block, and see what it returns. The real physical address for the UART block turns out to be 0x70005000. The flags of the LSR register at 0x70005014 are totally different, but the code up there shows the mapping. Oh, and the real RHR/THR register is actually at offset 0x00 – so this HAL doesn’t just translate peripheral base addresses, but also the ordering of registers within their blocks.

The problem is that this is a big jumptable, with a large number of functions. They’re all constructed as a series of if-then-else statements on the address, no doubt in an effort to make it all go fast. Clearly, this was written before tree traversal algorithms were invented.

The upshot is that it’s 16K of really dense machine code, a serious chore to decompile, parse and grok – it would just takes ages, and mistakes would be easy to make. Yet I have to do all of them, or I won’t have the real addresses of the SoC’s various features.

So with the physical address of 0x70005000 the UART works. That’s one problem solved, but now I’ve got hundreds of problems just like it.

This is horrible! What to do?