Emulate Your Way to Success
In the last episode of unlocking the Goke GK7101 SoC, we found ourselves faced with a big obstacle: a HAL layer in the form of I/O read/write calls that translated on-board peripherals’ register locations to their real addresses. The HAL’s underlying code is convoluted and much too hard to parse – it’s a large maze of twisty little if-then-elses, all alike. And since this SoC has tons of functionality, there are hundreds of register addresses to find.
But then, a surprise: a wild SDK appears! Somebody uploaded an official Goke SDK tarball to a certain open source repository site, and it has tons of code: the full Linux kernel as hacked up by Goke, their U-Boot source, a build system to make a full working system including root filesystem, and even some example applications that use their kernel drivers.
Weirdly, all of it is marked either GPL or, in a very few cases, public domain. You have to wonder why they’re not just putting this thing up for download; it’s literally all they have to do to abide by the GPL.
Sure enough, lots of information about the HAL is to be found. The struct
with the read and write calls is called hw_ops
:
struct hw_ops
{
int (*get_version)(void);
unsigned int (*reserved)(unsigned int );
unsigned char (*hw_readb)(unsigned int );
unsigned short (*hw_readw)(unsigned int );
unsigned int (*hw_readl)(unsigned int );
void (*hw_writeb)(unsigned char , unsigned int );
void (*hw_writew)(unsigned short , unsigned int );
void (*hw_writel)(unsigned int , unsigned int );
unsigned int (*flash_read)(void);
void (*flash_write)(unsigned int);
unsigned char (*usb_readb)(unsigned int ptr, unsigned int offset);
unsigned short (*usb_readw)(unsigned int ptr, unsigned int offset);
unsigned int (*usb_readl)(unsigned int ptr, unsigned int offset);
void (*usb_writeb)(unsigned int ptr, unsigned int offset, unsigned char value);
void (*usb_writew)(unsigned int ptr, unsigned int offset, unsigned short value);
void (*usb_writel)(unsigned int ptr, unsigned int offset, unsigned int value);
unsigned int (*dma_readl)(unsigned int ptr);
void (*dma_writel)(unsigned int ptr, unsigned int value);
#if SPI_API_MODE
unsigned char (*spi_readb)(unsigned int ptr);
unsigned short (*spi_readw)(unsigned int ptr);
unsigned int (*spi_readl)(unsigned int ptr);
void (*spi_writeb)(unsigned int ptr, unsigned char value);
void (*spi_writew)(unsigned int ptr, unsigned short value);
void (*spi_writel)(unsigned int ptr, unsigned int value);
#endif
};
Keeping in mind that all these are 32-bit function pointers, our 0x10
and
0x1c
offsets correspond to hw_readl
and hw_writel
, respectively.
The first function, get_version
, looks like this in Ghidra:
********************************************
* FUNCTION *
********************************************
uint32_t __stdcall get_version(void)
uint32_t r0:4 <RETURN>
get_version XREF[2]: hal_init:c00120b0(*),
c0012160(*)
c00121a8 ldr r0, [DAT_c00121b0] = 20151223h
c00121ac bx lr
DAT_c00121b0 XREF[1]: get_version:c00121a8(R
c00121b0 unde 20151223h
That simply returns the number 0x20151223
. We’ve seen that before, in the
vendor kernel’s boot log:
[ 0.000000] hal version = 20151223
What about the source to the actual HAL code? That gets put into memory by U-Boot, where the kernel then fences it off with the MMU, and uses it in-place. The U-Boot source has this:
static inline unsigned int gk_hal_init (int cp_flag)
{
unsigned int rval = 0;
unsigned int *haladdress;
haladdress = (unsigned int *)CONFIG_GK_HAL_ADDR;
*(volatile u32 *) (CONFIG_U2K_HAL_ADDR) = (u32)haladdress;
if(1==cp_flag)
memcpy(haladdress,hal_data,sizeof(hal_data));
hal_function_t hal_init = (hal_function_t) (haladdress) ;
g_hw = (struct hw_ops *)hal_init (0, 0, 0x90000000, 0xA0000000, 0) ;
return rval ;
}
It’s copied from hal_data
to CONFIG_GK_HAL_ADDR
, which is defined to
0xc0012000
. So what’s hal_data
?
int hal_data[]={
0xe92d45f8,0xe1a04000,0xe1a05001,0xe1a06002,0xe1a0a003,0xe59d7020,0xe3570000,0x0a00000e,
0xe59f8148,0xe58802f0,0xe5881320,0xe59832f8,0xe3530000,0x0a000002,0xe2880e2f,0xe3a01001,
...
For whatever reason, the HAL source is missing. Nice to have their kernel source, but we’re no closer to solving this massive HAL problem.
Breaking it down, getting the real register locations involves:
- identify “fake” register address.
- call the HAL’s
hw_readl()
function. - find out what that function translated the address to.
Now that we have source code, it’s at least gotten a lot easier to get a list of fake registers – and names to describe them, which is very nice to have. Here’s the UART definition:
#define UART_RB_OFFSET 0x04
#define UART_TH_OFFSET 0x04
#define UART_DLL_OFFSET 0x04
#define UART_IE_OFFSET 0x00
#define UART_DLH_OFFSET 0x00
#define UART_II_OFFSET 0x08
#define UART_FC_OFFSET 0x08
#define UART_LC_OFFSET 0x18
#define UART_MC_OFFSET 0x0c
#define UART_LS_OFFSET 0x14
#define UART_MS_OFFSET 0x10
#define UART_SC_OFFSET 0x1c /* Byte */
#define UART_SRR_OFFSET 0x88
[...]
/* UART[x]_LS_REG */
#define UART_LS_FERR 0x80
#define UART_LS_TEMT 0x40
#define UART_LS_BI 0x20
#define UART_LS_FE 0x10
#define UART_LS_THRE 0x08
#define UART_LS_DR 0x04
#define UART_LS_PE 0x02
#define UART_LS_OE 0x01
That matches the code in the vendor kernel’s decompressor exactly, including the LSR’s DR bit being at bit 2.
But how can we call hw_readl()
for all of those registers, and get the
results back in a way we can use in our code? We have the HAL code, but
it’s not like we can run the HAL without the GK7101… or can we?
It would actually be possible to run the code on any other ARM core sharing that ISA, if not for the fact that it interacts with actual memory locations: these HAL functions don’t just return the locations – they perform the actual reads and writes. So what we need is an ARM platform that will let us call the function but abort before doing the actual read operation.
Of course an architecture with an MMU makes that easy: by simply not mapping any memory except the part that runs the code, memory accesses to these unmapped locations generate an exception – that is, they run the instruction located at the ARM vector called Data Abort.
This would be even easier to do without having to mess with actual exception vectors on raw hardware, with custom handlers that somehow log what happened. What we need is something like QEMU – which supports ARM – to start up a little VM that only loads the HAL code, calls one function in it, and reports in some detail about a Data Abort exception. That’s actually a bit messy for QEMU; it’s intended for a much higher level of control. We need something lower-level than QEMU, but that’s somehow still easy to use and control. That seems like a big ask – would something like that even exist?
Enter the magnificent Unicorn. The authors took the basic low-level VM code in QEMU, and built a very different thing around it. Unicorn takes the form of a library, with an API that lets you create a VM, map memory into it, set callbacks to your code for exceptions, run code with a granularity down to one instruction, and read/write registers. It’s utterly easy to use, well documented, and exactly what we need. The API is in C, but has bindings for tons of languages, notably Python.
What we need to do, in order:
- Create an ARM VM.
- Map some memory into it, obviously not in the region where I/O might be.
We’ll use
0xc0000000
, since the HAL lives in that region. - Assign some of that memory to the stack, and set the virtual CPU’s stack
pointer (the
sp
register). - Drop the HAL code in there, at address
0xc0012000
– we have no guarantees that all that code is relocatable, so this will avoid problems. - Set up a callback for the Data Abort exception: the exception called when code tries to access memory the MMU doesn’t have a mapping for. We’ll just print the “bad” address from there, and that’ll give us the final address as determined by the HAL.
- Call the
hal_init()
function – recall it sets up some variables used by the address translation stuff later.
We’ll also need a little bit of code to call hw_readl()
with a supplied
address we want translated. Something like this should do it:
ldr r4, [pc, #4] // load hw_readl() pointer
ldr r0, [pc, #4] // load argument to hw_readl()
blx r4 // call hw_readl()
// --- code ends here ---
.long 0 // pointer to hw_readl() goes here
.long 0 // test address goes here
The Unicorn application is here. It’s a bit messy, with lots of debugging code, but it works. You give it a HAL code blob as extracted from the device and a test address:
sumner: ./emu-hal halcode-fromdevice 0xa0005014
0xa0005014 -> 0x70005014
Using the UART register offsets we found in the source code, we can thus map the whole UART block:
sumner: for reg in 04 00 08 18 0c 14 10 1c 88; do ./emu-hal halcode-fromdevice 0xa00050$reg; done
0xa0005004 -> 0x70005000
0xa0005000 -> 0x70005004
0xa0005008 -> 0x70005008
0xa0005018 -> 0x7000500c
0xa000500c -> 0x70005010
0xa0005014 -> 0x70005014
0xa0005010 -> 0x70005018
0xa000501c -> 0x7000501c
0xa0005088 -> 0x70005088
Success! We’ll need to extract register offsets for all subsystems from the source code, but that’s a luxury problem: a bit of parsing with a Python script is peanuts compared to having to extract them from decompiled drivers.
Next up, we’ll figure out the registers for the interrupt and timer
subsystems, so we can get a kernel to boot. We can use the early_print()
facility to debug that, since a proper UART driver won’t work until
we have interrupts.