Hal of Horrors
In the previous post we found the GK7101 SoC’s UART base address and a few registers by decompiling their version of Linux’s decompressor:
- UART base address is
0xa0005000
- The input/output register is at offset
0x04
- Offset 0x14 holds flags:
- Bit 6 needs to be high before sending.
- To drain the input buffer, read from offset
0x04
until bit 2 goes low.
This all looks suspiciously like a standard 16550
UART, but not quite: offset 0x04
corresponds to the 16550’s RHR/THR register (Read/Transmit Holding Register),
except the 16550 has it on offset 0x00
. Offset 0x14
matches the LSR (Line
Status Register), with bit 6 matching TEMT (Transmitter Empty), but bit 2
doesn’t quite match – it’s the equivalent of bit 0 in the LSR.
So it’s not a 16550-compatible UART, just a vague attempt at one. Here’s my version of the 4 macros to work this thing:
#define UART0_TH 0x04
#define UART0_LS 0x14
#define UART0_LS_DR (1 << 2)
#define UART0_LS_TEMT (1 << 6)
#ifdef CONFIG_DEBUG_UART_PHYS
.macro addruart, rp, rv, tmp
ldr \rp, =CONFIG_DEBUG_UART_PHYS
ldr \rv, =CONFIG_DEBUG_UART_VIRT
.endm
#endif
.macro waituart,rd,rx
1001: ldr \rd, [\rx, #UART0_LS]
tst \rd, #UART0_LS_TEMT
beq 1001b
.endm
.macro senduart,rd,rx
str \rd, [\rx, #UART0_TH]
.endm
.macro busyuart,rd,rx
1001: ldr \rd, [\rx, #UART0_LS]
tst \rd, #UART0_LS_DR
bne 1001b
.endm
To use this low-level console facility, you need to define the following in your kernel’s configuration:
CONFIG_DEBUG_LL=y
CONFIG_DEBUG_UNCOMPRESS=y
CONFIG_DEBUG_LL_INCLUDE="debug/gk710x.S"
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_UART_PHYS=0xA0005000
CONFIG_DEBUG_UART_VIRT=0xf3005000
That turns on the facility, enables it in the decompressor, and tells the
build system where to find the code. You’re supposed to create that file
and define the macros in there. CONFIG_EARLY_PRINTK
creates a function called
early_print()
, which uses these macros as well. Handy to have while you’re
debugging your real UART driver.
The last two definitions are used in the addruart
macro above, defining
the UART base address in physical and virtual (post-MMU) memory. How did I
come up with the virtual address? Remember this output from the vendor kernel
bootup?
[ 0.000000] AHB: 0x90000000 0xf2000000 -- 0x1000000
[ 0.000000] APB: 0xa0000000 0xf3000000 -- 0x1000000
[ 0.000000] PPM: 0xc0000000 0xc0000000 -- 0x200000
[ 0.000000] BSB: 0xc4800000 0xf5000000 -- 0x200000
[ 0.000000] DSP: 0xc4a00000 0xf6000000 -- 0x3600000
[ 0.000000] hal version = 20151223
That looks to me like the
MMIO mapping: first the
physical address, then the virtual address it gets mapped to, followed by the
length of the block. APB refers to the Advanced Peripheral Bus, a standard
facility on ARM chips. It’s a memory-mapped I/O bus for SoC designers to
hook low-speed peripherals to, exactly where you’d expect to find a UART.
The virtual address for the UART block within the APB is this 0xf3005000
.
That should be all that’s needed to get a mainline kernel’s decompressor output working on this chip.
And yet… it doesn’t work. The decompressor outputs nothing. Diddling those UART registers from the U-Boot shell (which has basic memory monitor commands) also does nothing.
But the vendor kernel didn’t really write to those addresses directly, did it? It used that weird struct with the read and write function pointers, feeding it the addresses we found – and that somehow did work.
This is some sort of abstraction between I/O reads/writes and the actual hardware. In other words, it’s a HAL (Hardware Abstraction Layer). Hardware companies often stick these between their drivers and the operating system they have to run on, in the hope it will save them time and effort. The idea is they can write one driver, and just have a HAL abstract out whether that driver needs to talk to Linux or Windows, for example.
The Linux kernel never accepts drivers that come with a HAL. It makes for an unnecessary internal API – the HAL layer – that nothing else will use, and it creates a lot of unnecessary code. Anyone that wants to mainline code based on a vendor HAL typically gets to rewrite the whole thing, using the vendor code as nothing more than a reference to registers and such. Of course, vendor code even without a HAL also tends to get rewritten from scratch before it goes into mainline, because it’s a pile of dung. But I digress.
Incidentally, not every operating system refuses HAL-based code: the Zephyr project happily accepts such code, for all the wrong reasons. That’s because it’s a project run by a consortium of competitors, managed by the Linux Foundation. In other words there isn’t a clue to be found. This is why we need Linus, folks.
Still, this particular HAL seems unusual – it doesn’t sit between a driver and the OS, but between I/O reads/writes and the registers!
Ghidra didn’t quite manage to extract the address of that struct, and it was
quite the struggle to find, but it turns out to live at 0xc0015ad0
.
U-Boot has a severe misfeature in that asking it to dump memory in 32-bit-sized
chunks will make it flip those chunks from little-endian to big-endian. However
this comes in handy if you’re on a wrong-endian system and you happen to be
human. Here’s the struct:
GK7101 # md.l c0015ad0 20
[PROCESS_SEPARATORS] md.l c0015ad0 20
c0015ad0: c00121a8 c0012904 c0012b80 c0012b0c .!...)...+...+..
c0015ae0: c00127e8 c0012a90 c0012a14 c001295c .'...*...*..\)..
c0015af0: c0012250 c0012268 c0012280 c00122a0 P"..h"..."..."..
c0015b00: c00122c0 c00122e0 c0012300 c0012320 ."..."...#.. #..
c0015b10: c0012340 c001235c c0015ad0 00042400 @#..\#...Z...$..
Those are clearly all pointers to the same region, roughly 0xc0012000
-
0xc0015000
.
We saw the read function at offset 0x10
, and write at 0x1c
. Let’s take
a look at read, at 0xc00127e8
. Remember it takes a single 32-bit unsigned
integer argument (the address) and returns the value at that address, also
uint32_t
.
uint hw_readl(uint *address)
{
undefined *puVar1;
uint *puVar2;
uint uVar3;
code *pcVar4;
uint *puVar5;
undefined4 local_14;
if (param_1 == DAT_c00128ec) {
LAB_c0012840:
uVar3 = *(uint *)(*DAT_c0012900 + 0x5014);
return uVar3 & 0xffffffc0 | 8 | (uVar3 & 6) >> 1 | (uVar3 & 1) << 2 | (uVar3 & 0x18) << 1;
}
if (param_1 < DAT_c00128ec || param_1 == DAT_c00128ec) {
if (param_1 != DAT_c00128f0) {
LAB_c0012874:
local_14 = 0;
puVar5 = param_1;
puVar1 = FUN_c00127b8((uint)param_1);
if (puVar1 == NULL) {
if (((uint)param_1 & 0xff000000) == 0x60000000 ||
((uint)param_1 & 0xff000000) == 0x70000000) {
return 0;
}
return *param_1;
}
puVar2 = (uint *)(**(code **)(puVar1 + 4))(param_1,&local_14);
if (puVar2 == NULL) {
return 0;
}
pcVar4 = *(code **)(puVar1 + 0x10);
if (pcVar4 == NULL) {
return *puVar2;
}
uVar3 = (*pcVar4)(local_14,*puVar2,0,pcVar4,puVar5);
return uVar3;
}
}
else {
if (param_1 != DAT_c00128f4) {
if (param_1 == DAT_c00128f8) goto LAB_c0012840;
goto LAB_c0012874;
}
}
return *(uint *)(*DAT_c00128fc + 0x16000);
}
This is very much spaghetti code, no doubt at least partly due to the decompiler’s distorted view: the machine code it translates is always spaghetti-shaped. Translated into proper C and then into pseudo-code it does this:
-
If the address is
0xA016018C
or0xF316018C
, return0x00474b37
. That’s actually the string “7KG”. Similarly, if the address is0xA0160188
or0xF3160188
, return0x00474b37
– the string “1010”. Note the0xa0000000
and0xf3000000
address ranges here – so these are definitely related, and this code is apparently meant to work on both physical and virtual addresses. -
If the address is
0xA0005014
or0xF3005014
– hey, that’s the UART LSR! – then read from an unknown prefix +0x5014
and shift the various bits in there around to match the almost-not-quite-16550 we found earlier. The unknown prefix turns out to be set by an earlier call to0xc0012000
, some sort of initialization of this HAL. -
Any other address, a function at
0xc00127b8
is called with the address, and a number of some sort is returned. That number is used as an index into a jumptable of functions; the indexed function is then called with the address and returns the “real” address. The read is then done on that address, and the result returned.
So all we have to do is figure out which index we get for our 0xa0005000
registers, decompile the function that handles this block, and see what
it returns. The real physical address for the UART block turns out to be
0x70005000
. The flags of the LSR register at 0x70005014
are totally
different, but the code up there shows the mapping. Oh, and the real
RHR/THR register is actually at offset 0x00
– so this HAL doesn’t just
translate peripheral base addresses, but also the ordering of registers
within their blocks.
The problem is that this is a big jumptable, with a large number of functions. They’re all constructed as a series of if-then-else statements on the address, no doubt in an effort to make it all go fast. Clearly, this was written before tree traversal algorithms were invented.
The upshot is that it’s 16K of really dense machine code, a serious chore to decompile, parse and grok – it would just takes ages, and mistakes would be easy to make. Yet I have to do all of them, or I won’t have the real addresses of the SoC’s various features.
So with the physical address of 0x70005000
the UART works. That’s one problem
solved, but now I’ve got hundreds of problems just like it.
This is horrible! What to do?