Emulate Your Way to Success

In the last episode of unlocking the Goke GK7101 SoC, we found ourselves faced with a big obstacle: a HAL layer in the form of I/O read/write calls that translated on-board peripherals’ register locations to their real addresses. The HAL’s underlying code is convoluted and much too hard to parse – it’s a large maze of twisty little if-then-elses, all alike. And since this SoC has tons of functionality, there are hundreds of register addresses to find.

But then, a surprise: a wild SDK appears! Somebody uploaded an official Goke SDK tarball to a certain open source repository site, and it has tons of code: the full Linux kernel as hacked up by Goke, their U-Boot source, a build system to make a full working system including root filesystem, and even some example applications that use their kernel drivers.

Weirdly, all of it is marked either GPL or, in a very few cases, public domain. You have to wonder why they’re not just putting this thing up for download; it’s literally all they have to do to abide by the GPL.

Sure enough, lots of information about the HAL is to be found. The struct with the read and write calls is called hw_ops:

struct hw_ops
    int (*get_version)(void);
    unsigned int (*reserved)(unsigned int );

    unsigned char (*hw_readb)(unsigned int );
    unsigned short (*hw_readw)(unsigned int );
    unsigned int (*hw_readl)(unsigned int );

    void (*hw_writeb)(unsigned char , unsigned int );
    void (*hw_writew)(unsigned short , unsigned int );
    void (*hw_writel)(unsigned int , unsigned int );

    unsigned int (*flash_read)(void);
    void (*flash_write)(unsigned int);

    unsigned char (*usb_readb)(unsigned int ptr, unsigned int offset);
    unsigned short (*usb_readw)(unsigned int ptr, unsigned int offset);
    unsigned int (*usb_readl)(unsigned int ptr, unsigned int offset);
    void (*usb_writeb)(unsigned int ptr, unsigned int offset, unsigned char value);
    void (*usb_writew)(unsigned int ptr, unsigned int offset, unsigned short value);
    void (*usb_writel)(unsigned int ptr, unsigned int offset, unsigned int value);

    unsigned int (*dma_readl)(unsigned int ptr);
    void (*dma_writel)(unsigned int ptr, unsigned int value);

    unsigned char (*spi_readb)(unsigned int ptr);
    unsigned short (*spi_readw)(unsigned int ptr);
    unsigned int (*spi_readl)(unsigned int ptr);
    void (*spi_writeb)(unsigned int ptr, unsigned char value);
    void (*spi_writew)(unsigned int ptr, unsigned short value);
    void (*spi_writel)(unsigned int ptr, unsigned int value);

Keeping in mind that all these are 32-bit function pointers, our 0x10 and 0x1c offsets correspond to hw_readl and hw_writel, respectively.

The first function, get_version, looks like this in Ghidra:

                *                 FUNCTION                 *
                uint32_t __stdcall get_version(void)
     uint32_t     r0:4      <RETURN>
                get_version                       XREF[2]: hal_init:c00120b0(*), 
c00121a8  ldr   r0, [DAT_c00121b0]                                       = 20151223h
c00121ac  bx    lr
                DAT_c00121b0                      XREF[1]: get_version:c00121a8(R
c00121b0  unde  20151223h

That simply returns the number 0x20151223. We’ve seen that before, in the vendor kernel’s boot log:

[    0.000000] hal version = 20151223 

What about the source to the actual HAL code? That gets put into memory by U-Boot, where the kernel then fences it off with the MMU, and uses it in-place. The U-Boot source has this:

static inline unsigned int gk_hal_init (int cp_flag)
    unsigned int rval = 0;
    unsigned int *haladdress;

    haladdress = (unsigned int *)CONFIG_GK_HAL_ADDR;
    *(volatile u32 *) (CONFIG_U2K_HAL_ADDR) = (u32)haladdress;

    hal_function_t hal_init = (hal_function_t) (haladdress) ;

    g_hw = (struct hw_ops *)hal_init (0, 0, 0x90000000, 0xA0000000, 0) ;

    return rval ;

It’s copied from hal_data to CONFIG_GK_HAL_ADDR, which is defined to 0xc0012000. So what’s hal_data?

int hal_data[]={

For whatever reason, the HAL source is missing. Nice to have their kernel source, but we’re no closer to solving this massive HAL problem.

Breaking it down, getting the real register locations involves:

  • identify “fake” register address.
  • call the HAL’s hw_readl() function.
  • find out what that function translated the address to.

Now that we have source code, it’s at least gotten a lot easier to get a list of fake registers – and names to describe them, which is very nice to have. Here’s the UART definition:

#define UART_RB_OFFSET          0x04
#define UART_TH_OFFSET          0x04
#define UART_DLL_OFFSET         0x04
#define UART_IE_OFFSET          0x00
#define UART_DLH_OFFSET         0x00
#define UART_II_OFFSET          0x08
#define UART_FC_OFFSET          0x08
#define UART_LC_OFFSET          0x18
#define UART_MC_OFFSET          0x0c
#define UART_LS_OFFSET          0x14
#define UART_MS_OFFSET          0x10
#define UART_SC_OFFSET          0x1c    /* Byte */
#define UART_SRR_OFFSET         0x88
/* UART[x]_LS_REG */
#define UART_LS_FERR            0x80
#define UART_LS_TEMT            0x40
#define UART_LS_BI              0x20
#define UART_LS_FE              0x10
#define UART_LS_THRE            0x08
#define UART_LS_DR              0x04
#define UART_LS_PE              0x02
#define UART_LS_OE              0x01

That matches the code in the vendor kernel’s decompressor exactly, including the LSR’s DR bit being at bit 2.

But how can we call hw_readl() for all of those registers, and get the results back in a way we can use in our code? We have the HAL code, but it’s not like we can run the HAL without the GK7101… or can we?

It would actually be possible to run the code on any other ARM core sharing that ISA, if not for the fact that it interacts with actual memory locations: these HAL functions don’t just return the locations – they perform the actual reads and writes. So what we need is an ARM platform that will let us call the function but abort before doing the actual read operation.

Of course an architecture with an MMU makes that easy: by simply not mapping any memory except the part that runs the code, memory accesses to these unmapped locations generate an exception – that is, they run the instruction located at the ARM vector called Data Abort.

This would be even easier to do without having to mess with actual exception vectors on raw hardware, with custom handlers that somehow log what happened. What we need is something like QEMU – which supports ARM – to start up a little VM that only loads the HAL code, calls one function in it, and reports in some detail about a Data Abort exception. That’s actually a bit messy for QEMU; it’s intended for a much higher level of control. We need something lower-level than QEMU, but that’s somehow still easy to use and control. That seems like a big ask – would something like that even exist?

Enter the magnificent Unicorn. The authors took the basic low-level VM code in QEMU, and built a very different thing around it. Unicorn takes the form of a library, with an API that lets you create a VM, map memory into it, set callbacks to your code for exceptions, run code with a granularity down to one instruction, and read/write registers. It’s utterly easy to use, well documented, and exactly what we need. The API is in C, but has bindings for tons of languages, notably Python.

What we need to do, in order:

  • Create an ARM VM.
  • Map some memory into it, obviously not in the region where I/O might be. We’ll use 0xc0000000, since the HAL lives in that region.
  • Assign some of that memory to the stack, and set the virtual CPU’s stack pointer (the sp register).
  • Drop the HAL code in there, at address 0xc0012000 – we have no guarantees that all that code is relocatable, so this will avoid problems.
  • Set up a callback for the Data Abort exception: the exception called when code tries to access memory the MMU doesn’t have a mapping for. We’ll just print the “bad” address from there, and that’ll give us the final address as determined by the HAL.
  • Call the hal_init() function – recall it sets up some variables used by the address translation stuff later.

We’ll also need a little bit of code to call hw_readl() with a supplied address we want translated. Something like this should do it:

ldr r4, [pc, #4]	// load hw_readl() pointer
ldr r0, [pc, #4]	// load argument to hw_readl()
blx r4				// call hw_readl()
// --- code ends here ---
.long 0		// pointer to hw_readl() goes here
.long 0		// test address goes here

The Unicorn application is here. It’s a bit messy, with lots of debugging code, but it works. You give it a HAL code blob as extracted from the device and a test address:

sumner: ./emu-hal halcode-fromdevice 0xa0005014
0xa0005014 -> 0x70005014

Using the UART register offsets we found in the source code, we can thus map the whole UART block:

sumner: for reg in 04 00 08 18 0c 14 10 1c 88; do ./emu-hal halcode-fromdevice 0xa00050$reg; done
0xa0005004 -> 0x70005000
0xa0005000 -> 0x70005004
0xa0005008 -> 0x70005008
0xa0005018 -> 0x7000500c
0xa000500c -> 0x70005010
0xa0005014 -> 0x70005014
0xa0005010 -> 0x70005018
0xa000501c -> 0x7000501c
0xa0005088 -> 0x70005088

Success! We’ll need to extract register offsets for all subsystems from the source code, but that’s a luxury problem: a bit of parsing with a Python script is peanuts compared to having to extract them from decompiled drivers.

Next up, we’ll figure out the registers for the interrupt and timer subsystems, so we can get a kernel to boot. We can use the early_print() facility to debug that, since a proper UART driver won’t work until we have interrupts.