<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="https://biot.com/blog/xml/base.min.xml" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Research, reverse engineering and rants</title>
    <link>https://biot.com/blog/</link>
    <description>Recent content on Research, reverse engineering and rants</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Sun, 08 Nov 2020 22:08:03 +0100</lastBuildDate>
    <atom:link href="https://biot.com/blog/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Tackling Power Over Ethernet</title>
      <link>https://biot.com/blog/posts/tackling-power-over-ethernet/</link>
      <pubDate>Sun, 08 Nov 2020 22:08:03 +0100</pubDate>
      <guid>https://biot.com/blog/posts/tackling-power-over-ethernet/</guid>
      <description>&lt;p&gt;Linux has no support for &lt;a href=&#34;https://en.wikipedia.org/wiki/Power_over_Ethernet&#34;&gt;Power over
Ethernet&lt;/a&gt; (PoE) in any
way. There are many ethernet switches on the market that support PoE, and
many of those switches run Linux, but none of those are supported in the
mainline kernel, and their PoE support has not been upstreamed.&lt;/p&gt;
&lt;p&gt;This situation is changing with the &lt;a href=&#34;https://biot.com/switches/&#34;&gt;Realtek RTL83xx
project&lt;/a&gt;. This series of
&lt;a href=&#34;https://en.wikipedia.org/wiki/System_on_a_chip&#34;&gt;SoCs&lt;/a&gt; is very popular, found
in &lt;a href=&#34;https://biot.com/switches/models&#34;&gt;dozens&lt;/a&gt; of distinct models of many
different vendors. The kernel code is coming together nicely. Many systems
built on these chips support PoE, and this will need to be handled like any
other feature on those chips. And there is much to support about PoE: number of
ports, power modes, power budget, priority, and much more.&lt;/p&gt;
&lt;p&gt;An opportunity thus represents itself: to create a new kernel subsystem and
accompanying userspace tools, and have it automatically become the standard way
of supporting PoE on Linux.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s take a look at what&amp;rsquo;s needed.&lt;/p&gt;
&lt;h3 id=&#34;hardware&#34;&gt;Hardware&lt;/h3&gt;
&lt;p&gt;The voltage on current-carrying lines of PoE ports is in a fundamentally
different range, anywhere from 44V to 57V, than what a switch uses internally.
The chip on the switch&amp;rsquo;s motherboard that provides this voltage is called the
Power Sourcing Equipment (PSE) controller. It lives in a different power domain
than the rest of the switch, typically with its own hookup to the system&amp;rsquo;s
power supply. That means communication with these PSEs has to go via an
electrically isolated path.&lt;/p&gt;
&lt;p&gt;Generally the SoC doesn&amp;rsquo;t talk to the PSE directly; there&amp;rsquo;s a dedicated
microcontroller of some sort that sits between them, mediating what the SoC can
drive (often only UART) with what the PSE needs, such as I2C. In some cases
these two form a product pair, such as the Microsemi
&lt;a href=&#34;https://web.archive.org/web/20201108215004/https://www.microsemi.com/document-portal/doc_view/129275-pd69100-datasheet&#34;&gt;PD69100&lt;/a&gt;/&lt;a href=&#34;https://web.archive.org/web/20190221163738/https://www.microsemi.com/document-portal/doc_download/123525-pd69108-pb-r-pdf&#34;&gt;PD69108&lt;/a&gt;
management/PSE controllers, respectively. But usually the management
microcontroller is something more generic, such as an
&lt;a href=&#34;https://www.st.com/en/microcontrollers-microprocessors/stm32f100-value-line.html&#34;&gt;STM32F100&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll need to know the protocols the SoC has to speak to these different
microcontrollers if we&amp;rsquo;re going to control PoE. It&amp;rsquo;s not always documented, but
we&amp;rsquo;ve got &lt;a href=&#34;https://biot.com/switches/software/poe_management&#34;&gt;a good handle&lt;/a&gt; on
it. Indeed, the firmware running on these microcontrollers may turn out to be
an interesting target for replacement with an open source version;
&lt;a href=&#34;https://github.com/joric/es120tris&#34;&gt;crazier&lt;/a&gt;
&lt;a href=&#34;https://magiclantern.fm/&#34;&gt;things&lt;/a&gt; have been done.&lt;/p&gt;
&lt;h3 id=&#34;kernel&#34;&gt;Kernel&lt;/h3&gt;
&lt;p&gt;A PoE kernel subsystem should do the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Provide a framework for drivers that know about the protocol which the PSE,
or its intermediate microcontroller, needs. Since the other endpoint of that
protocol is dependent on the switch model, this information needs to live in
the device tree. The drivers will also need to be aware of the firmware
version on the other end, at least until we can replace it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The device tree should also provide information about the model&amp;rsquo;s PoE
capabilities, in order to mediate requests from userspace.
Managing the power budget, for example, should be done
here &amp;ndash; no need to force every userspace client to keep track of this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A userspace API, either via an ioctl on a device special file &lt;code&gt;/dev/poe&lt;/code&gt;
or perhaps a more modern sysfs interface.
It&amp;rsquo;s easy enough to work out what this API should provide: whatever the
PSE/MCU protocols provide must have an equivalent in this API.
Additionally we can add discoverability to that: the userspace application
should be able to find out which PoE capabilities the system has, on which ports.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;userspace&#34;&gt;Userspace&lt;/h3&gt;
&lt;p&gt;While a userspace API is all good and well, using it involves awkward calls to
ioctl, with lots of symbols pulled in from kernel headers, or lots of
error-prone messing with sysfs files. That&amp;rsquo;s pretty laborious, and extra hard
for scripting languages to tackle. A better way to handle this is to write a
small library to sit in front of the awkward operations and symbol soup. A good
example if this is libgpiod, which sits in front of the userspace interface to
the kernel gpio subsystem. A library like this is also pretty easy to write
language bindings for.&lt;/p&gt;
&lt;p&gt;A quick and easy way to use PoE functionality from the shell is also a good
idea. You need such a shell tool to test the library as it&amp;rsquo;s developed, anyway.&lt;/p&gt;
&lt;p&gt;Writing a web interface that can work the system&amp;rsquo;s PoE functionality is then a
matter of using the language bindings of the library as well. Easy!&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m working on lots of different aspects of the Realtek RTL83xx kernel code
at the moment, but this is an interesting enough opportunity that I think
I need to get started on it &amp;ndash; before somebody else does!&lt;/p&gt;
</description>
    </item>
    <item>
      <title>In Through the Out Door</title>
      <link>https://biot.com/blog/posts/in-through-the-out-door/</link>
      <pubDate>Sun, 17 Nov 2019 20:29:24 +0100</pubDate>
      <guid>https://biot.com/blog/posts/in-through-the-out-door/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://biot.com/blog/posts/emulate-your-way-to-success/&#34;&gt;previous episode&lt;/a&gt; of this
reverse engineering effort, we finally found a good way to get hold of
the real hardware register addresses, and we extracted the UART registers
to begin with.&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s needed for a minimal booting kernel is first interrupts, meaning
information on how to drive the SoC&amp;rsquo;s interrupt controller, and timers, also
a facility supplied by the SoC. We really only need one timer: the system
timer, which supplies Linux&amp;rsquo;s &amp;ldquo;ticks&amp;rdquo; &amp;ndash; it&amp;rsquo;s what drives the scheduler.
When you enable the system timer, you set it to fire at a set interval; firing
means generating an interrupt which you&amp;rsquo;ve linked it to.&lt;/p&gt;
&lt;p&gt;The interrupt controller in the GK7101 is called the VIC (Vectored Interrupt
Controller); there are two of them, each handling up to 32 interrupts. ARM
provides a standard interrupt controller facility with its
CPU cores, called the PrimeCell VIC. As it turns out Goke&amp;rsquo;s VIC is almost, but
not quite, compatible with it. This is turning into a theme; we saw the same
oddity with its not-quite-16550 UART.&lt;/p&gt;
&lt;p&gt;No matter, we have the SDK source code to figure out how to work it. First
let&amp;rsquo;s get the registers from the SDK header, and run them through our HAL
emulator:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;/****************************************************/
/* Controller registers definitions                 */
/****************************************************/
#define VIC_IRQ_STA_OFFSET      0x30
#define VIC_FIQ_STA_OFFSET      0x34
#define VIC_RAW_STA_OFFSET      0x18
#define VIC_INT_SEL_OFFSET      0x0c
#define VIC_INTEN_OFFSET        0x10
#define VIC_INTEN_CLR_OFFSET    0x14
#define VIC_SOFTEN_OFFSET       0x1c
#define VIC_SOFTEN_CLR_OFFSET   0x20
#define VIC_PROTEN_OFFSET       0x24
#define VIC_SENSE_OFFSET        0x00
#define VIC_BOTHEDGE_OFFSET     0x08
#define VIC_EVENT_OFFSET        0x04
#define VIC_EDGE_CLR_OFFSET     0x38
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The vendor driver thinks the VIC lives on the AHB at &lt;code&gt;0xf2000000&lt;/code&gt;, with the
two VICs at offset &lt;code&gt;0x8000&lt;/code&gt; and &lt;code&gt;0x9000&lt;/code&gt; respectively. So the VIC1 status
register would be at &lt;code&gt;0xf2008030&lt;/code&gt;. Running these through the HAL for
translation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sumner: for reg in 30 34 18 0c 10 14 1c 20 24 00 08 04 38; do ./emu-hal halcode-fromdevice 0xf20080$reg; done
0xf2008030 -&amp;gt; 0xf0003000
0xf2008034 -&amp;gt; 0xf0003004
0xf2008018 -&amp;gt; 0xf0003008
0xf200800c -&amp;gt; 0xf000300c
0xf2008010 -&amp;gt; 0xf0003010
0xf2008014 -&amp;gt; 0xf0003014
0xf200801c -&amp;gt; 0xf0003018
0xf2008020 -&amp;gt; 0xf000301c
0xf2008024 -&amp;gt; 0xf0003020
0xf2008000 -&amp;gt; 0xf0003024
0xf2008008 -&amp;gt; 0xf0003028
0xf2008004 -&amp;gt; 0xf000302c
0xf2008038 -&amp;gt; 0xf0003038
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Hey, look at how these are suddenly sorted! And how they suddenly match ARM&amp;rsquo;s
PrimeCell VIC layout! It makes me wonder whether there&amp;rsquo;s an extra license charge
for using ARM&amp;rsquo;s UART, VIC and so on.&lt;/p&gt;
&lt;p&gt;The timer subsystem officially lives at the APB base &lt;code&gt;0xf3000000&lt;/code&gt; + timer
offset (which is actually 0), and translates to post-MMU addresses like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sumner: for reg in 0c 00 04 08 14 18 1c 20 24 28 30 34 38; do ./emu-hal halcode-fromdevice 0xf30000$reg; done
0xf300000c -&amp;gt; 0xf100b030
0xf3000000 -&amp;gt; 0xf100b000
0xf3000004 -&amp;gt; 0xf100b008
0xf3000008 -&amp;gt; 0xf100b00c
0xf3000014 -&amp;gt; 0xf100b010
0xf3000018 -&amp;gt; 0xf100b018
0xf300001c -&amp;gt; 0xf100b01c
0xf3000020 -&amp;gt; 0xf100b020
0xf3000024 -&amp;gt; 0xf100b028
0xf3000028 -&amp;gt; 0xf100b02c
0xf3000030 -&amp;gt; 0xf100b004
0xf3000034 -&amp;gt; 0xf100b014
0xf3000038 -&amp;gt; 0xf100b024
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Making a long story short: I implemented interrupt controller and timer
drivers in a mainline kernel, and it didn&amp;rsquo;t work. Fortunately we&amp;rsquo;ve got
that &lt;code&gt;early_print()&lt;/code&gt; facility. By inserting those liberally, I figured out
the kernel hung in &lt;code&gt;init/calibrate.c:calibrate_delay_converge()&lt;/code&gt;. That&amp;rsquo;s
the first kernel function to use the timer: it makes some precise
measurements for delay loops and such, and does this by busy-looping
waiting for &lt;em&gt;jiffies&lt;/em&gt; to change. Jiffies is updated only by the system
timer; if that never fires, you&amp;rsquo;ve got a hang.&lt;/p&gt;
&lt;p&gt;The big question here is what&amp;rsquo;s broken: either one of interrupts or timer not
working would break this function.&lt;/p&gt;
&lt;p&gt;On the assumption I&amp;rsquo;d overlooked some hardware initialization somewhere,
I dug deeper into the SDK source and decompiled vendor kernel&amp;hellip;
and found exactly that. There is some hardware setup in, of all
places, the pre-kernel decompressor. From the SDK source:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;gk_gpio_clrbitsl(GK_PA_GPIO0 + 0x3c, 0x00000001);
#ifdef CONFIG_PHY_USE_AO_MCLK
gk_rct_writel(GK_PA_RCT + 0x024, 0x00124021);
gk_rct_writel(GK_PA_RCT + 0x078, 0x00555555);
gk_rct_writel(GK_PA_RCT + 0x084, 0x00000004);
gk_rct_writel(GK_PA_RCT + 0x080, 0x00000001);
#endif

//misc clock configure
// SFLASH ioctrl
gk_rct_writel(GK_PA_RCT + 0x0198, 0x00000011);

// Sensor ioctrl
gk_rct_writel(GK_PA_RCT + 0x019C, 0x00000012);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The write calls in the &lt;code&gt;PHY_USE_A0_MCLK&lt;/code&gt; block (which is defined),
certainly look like they might be related to a system clock. Alas, this
made no difference: still a hang at the calibration function.&lt;/p&gt;
&lt;p&gt;This is rather hard to debug. I don&amp;rsquo;t have JTAG; the GK7101 may well have
it, but it&amp;rsquo;s certainly not broken out on this camera board. The SoC is a
BGA package, so there&amp;rsquo;s not going to be any probing pins for JTAG. The Linux
kernel certainly supports this class of CPU, so interrupt vectors and handlers
should just work here. I&amp;rsquo;ve gone over my code lots of times, and can&amp;rsquo;t
find anything wrong.&lt;/p&gt;
&lt;p&gt;On the assumption I must therefore still be driving either the interrupt
controller or timer facility wrong, the way forward is clear: go over
the running vendor kernel again until I find what else it&amp;rsquo;s doing.
You can only stare at the same bits of code so many times though.&lt;/p&gt;
&lt;p&gt;What I&amp;rsquo;d really like to do is get a log of all the hardware register
accesses the vendor kernel does, so I can compare that to my code. But of
course the Linux kernel doesn&amp;rsquo;t have anything like a register-level tracepoint,
these are just memory access reads/writes after all.&lt;/p&gt;
&lt;p&gt;But&amp;hellip; the vendor kernel &lt;strong&gt;does&lt;/strong&gt; have such a facility: that horrible HAL! It
sits exactly where we need a trace facility: between drivers and their
register-level access. What if we could get the HAL to not just do its
read/write thing, but also log what it did, and which function asked it to?&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;d have to intercept calls to the HAL. That&amp;rsquo;s not too hard: we know
where it lives, where its jumptable is, and we have access to it at U-Boot
time. After all, U-Boot puts it in memory before the kernel is called. It
turns out that the HAL jumptable &amp;ndash; the &lt;code&gt;hw_ops&lt;/code&gt; structure &amp;ndash; is actually
populated by the call to &lt;code&gt;hal_init()&lt;/code&gt;. So we need to intercept the call
to &lt;code&gt;hal_init()&lt;/code&gt; by the vendor kernel&amp;rsquo;s decompressor as well. Why not do
both at once: change &lt;code&gt;hal_init()&lt;/code&gt; to call some code of ours &lt;em&gt;after&lt;/em&gt; it&amp;rsquo;s
populated the jumptable, so we can then:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Save a copy of that jumptable somewhere.&lt;/li&gt;
&lt;li&gt;Populate the jumptable with our own functions, which log the call and
then call the original function from the saved jumptable.&lt;/li&gt;
&lt;li&gt;Boot the vendor kernel, and grab the log from memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course &lt;em&gt;memory&lt;/em&gt; is rather a difficult facility at this level. It turns
out that &lt;code&gt;hal_init()&lt;/code&gt; is called twice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Once by the decompressor, with arguments that make the HAL translate
from physical addresses, since the MMU is off at decompression time.&lt;/li&gt;
&lt;li&gt;A second time during proper kernel startup, this time with arguments
that are virtual addresses.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That means our logging code will need to figure out whether it&amp;rsquo;s running
with or without MMU, and adjust its log memory pointer accordingly. This
is normally done by a call to the MMU coprocessor, but there&amp;rsquo;s
an easier way: the decompressor runs from the &lt;code&gt;0xc1000000&lt;/code&gt; block (where
U-Boot put it), but it decompresses the kernel into the &lt;code&gt;0x80000000&lt;/code&gt; block,
from where the kernel then runs. So checking the caller function&amp;rsquo;s high
byte will tell us where main memory lives. This is especially easy on ARM:
the caller&amp;rsquo;s address is in the &lt;code&gt;lr&lt;/code&gt; register. You don&amp;rsquo;t even have to go
find it on the stack!&lt;/p&gt;
&lt;p&gt;As to memory location regardless of MMU prefix: we&amp;rsquo;ll just drop it in the
middle of the RAM range. All things being equal, a kernel boot shouldn&amp;rsquo;t
use so much memory that it would reach halfway its RAM, even on this board&amp;rsquo;s
measly 16MB.&lt;/p&gt;
&lt;p&gt;This is all low-level enough that it needs to be done in machine code.
The code is &lt;a href=&#34;https://github.com/biot/gk710x-tools/blob/master/hal-init-wedge.S&#34;&gt;here&lt;/a&gt;.
You&amp;rsquo;ll have to excuse the crudity of the code: it&amp;rsquo;s the first time I&amp;rsquo;ve
played with ARM machine code. Incidentally, this technique of inserting
a jump right into some other function, and having that function end with
running the instruction that was lost and then jumping back, is called
a &lt;em&gt;wedge&lt;/em&gt;. At least that&amp;rsquo;s what it was called in the Commodore 64 days;
over in serious business PC land they called it a TSR (Terminate and
Stay Resident). TLAs make everything better, don&amp;rsquo;t they?&lt;/p&gt;
&lt;p&gt;From the U-Boot shell we can use the &lt;code&gt;tftpboot&lt;/code&gt; command to load the compiled
blob of our wedge at &lt;code&gt;0xc0016000&lt;/code&gt;, and call it with &amp;ldquo;&lt;code&gt;go c0016000&lt;/code&gt;&amp;rdquo;. The wedge
is now inserted at the end of &lt;code&gt;hal_init()&lt;/code&gt;, where it will switch the
jumptable around as explained above. Booting the vendor kernel will
populate the log. I&amp;rsquo;ve made a custom root filesystem with the awesome
&lt;a href=&#34;https://buildroot.org/&#34;&gt;buildroot&lt;/a&gt;, and made the vendor kernel load that
off a microSD card. We can then use
&lt;a href=&#34;https://github.com/pengutronix/memtool/blob/master/memtool.c&#34;&gt;memtool&lt;/a&gt; to
grab the log from memory. Piping it to gzip and scp&amp;rsquo;ing it over the network
for analysis was the quickest way to get the log.&lt;/p&gt;
&lt;p&gt;Each call to a HAL function saves four 32-bit items:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The caller&amp;rsquo;s address.&lt;/li&gt;
&lt;li&gt;An integer denoting which function in the jumptable was called, i.e.
a number from 0 to 17.&lt;/li&gt;
&lt;li&gt;The first argument to the call &amp;ndash; for read functions this is the address,
write functions have the value to be written here&lt;/li&gt;
&lt;li&gt;The second argument; only used for write calls, is the address to be written.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&amp;rsquo;s a sample:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0000000: 080a 00c1 0400 0000 3c90 00a0 0000 0000  ........&amp;lt;.......
0000010: 140a 00c1 0700 0000 0000 0000 3c90 00a0  ............&amp;lt;...
0000020: 280a 00c1 0700 0000 2040 1200 2400 17a0  (....... @..$...
0000030: 3c0a 00c1 0700 0000 0000 0000 7800 17a0  &amp;lt;...........x...
[...]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first entry shows an instruction at &lt;code&gt;0xc1000a08&lt;/code&gt; called
&lt;code&gt;hw_ops-&amp;gt;hw_readl(0xa000903c)&lt;/code&gt;. The second one translates to
&lt;code&gt;hw_ops-&amp;gt;hw_writel(0, 0xa000903c)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A little Python makes quick work of this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0xc1000a08 hw_readl(0xa000903c)             
0xc1000a14 hw_writel(0x0, 0xa000903c)       
0xc1000a28 hw_writel(0x124020, 0xa0170024)  
0xc1000a3c hw_writel(0x0, 0xa0170078)       
0xc1000a50 hw_writel(0x4, 0xa0170084)       
0xc1000a64 hw_writel(0x1, 0xa0170080)       
0xc1000a78 hw_writel(0x11, 0xa0170198)      
0xc1000a90 hw_writel(0x112032, 0xa017008c)  
0xc1000868 hw_readl(0xa0005014)             
0xc1000884 hw_writel(0x55, 0xa0005004)      
0xc1000868 hw_readl(0xa0005014)             
0xc1000868 hw_readl(0xa0005014)             
0xc1000884 hw_writel(0x6e, 0xa0005004)      
0xc1000868 hw_readl(0xa0005014)             
0xc1000868 hw_readl(0xa0005014)             
0xc1000884 hw_writel(0x63, 0xa0005004)      
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;See those &lt;code&gt;hw_writel()&lt;/code&gt; calls with &lt;code&gt;0xa0005004&lt;/code&gt; as the address? That&amp;rsquo;s the
UART input/output register, and it&amp;rsquo;s writing 0x55, 0x6e and 0x63 &amp;ndash; that&amp;rsquo;s
ASCII for &amp;ldquo;Unc&amp;rdquo;, the start of &amp;ldquo;Uncompressing Linux&amp;hellip;&amp;quot;.&lt;/p&gt;
&lt;p&gt;The caller address is interesting to see, but looking up which function
that&amp;rsquo;s in is going to be a bit of work. Fortunately we have a shell on this
running kernel, and Linux publishes its function symbol table
in &lt;code&gt;/proc/kallsyms&lt;/code&gt; &amp;ndash; not for the decompressor, but for the kernel proper.
If we feed those symbols into a Ghidra disassembly of the decompressed kernel,
it can give all those functions names in its disassembly.&lt;/p&gt;
&lt;p&gt;Ghidra also has a very, &lt;em&gt;very&lt;/em&gt; elaborate API (it&amp;rsquo;s in Java, so overengineered
to the gills). We can use that to feed the caller address to an API function
that determines the function that address is in, and show the function
name next to the caller.&lt;/p&gt;
&lt;p&gt;Also, the log output shows the addresses the vendor drivers &lt;em&gt;think&lt;/em&gt; they&amp;rsquo;re
writing to, but those are of course pre-HAL. But since we have our handy
HAL emulator, we could enrich our log output still further by adding the
&lt;em&gt;real&lt;/em&gt; address it ends up using &amp;ndash; very handy if you&amp;rsquo;re writing drivers
that don&amp;rsquo;t use this HAL nonsense. Here&amp;rsquo;s what the enriched log looks like,
starting right after the decompressor handed over to the kernel proper:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0x804ea6e0 gk7101_map_io                  get_version(0xc0015ad0)           -&amp;gt; 0x0
0x804ea8d0 gk7101_init_irq                hw_writel(0x0, 0xf2008000)        -&amp;gt; 0xf0003024
0x804ea8e4 gk7101_init_irq                hw_writel(0x0, 0xf2008008)        -&amp;gt; 0xf0003028
0x804ea8f8 gk7101_init_irq                hw_writel(0x0, 0xf2008004)        -&amp;gt; 0xf000302c
0x804ea914 gk7101_init_irq                hw_writel(0x0, 0xf2009000)        -&amp;gt; 0xf0010024
0x804ea928 gk7101_init_irq                hw_writel(0x0, 0xf2009008)        -&amp;gt; 0xf0010028
0x804ea93c gk7101_init_irq                hw_writel(0x0, 0xf2009004)        -&amp;gt; 0xf001002c
0x804ea950 gk7101_init_irq                hw_writel(0x0, 0xf200800c)        -&amp;gt; 0xf000300c
0x804ea964 gk7101_init_irq                hw_writel(0x0, 0xf2008010)        -&amp;gt; 0xf0003010
0x804ea978 gk7101_init_irq                hw_writel(0xffffffff, 0xf2008014) -&amp;gt; 0xf0003014
0x804ea98c gk7101_init_irq                hw_writel(0xffffffff, 0xf2008038) -&amp;gt; 0xf0003038
0x804ea9a0 gk7101_init_irq                hw_writel(0x0, 0xf200900c)        -&amp;gt; 0xf001000c
0x804ea9b4 gk7101_init_irq                hw_writel(0x0, 0xf2009010)        -&amp;gt; 0xf0010010
0x804ea9c8 gk7101_init_irq                hw_writel(0xffffffff, 0xf2009014) -&amp;gt; 0xf0010014
0x804ea9dc gk7101_init_irq                hw_writel(0xffffffff, 0xf2009038) -&amp;gt; 0xf0010038
0x80017d44 gk7101_irq_set_type            hw_readl(0xf2008000)              -&amp;gt; 0xf0003024
0x80017d64 gk7101_irq_set_type            hw_readl(0xf2008008)              -&amp;gt; 0xf0003028
0x80017d80 gk7101_irq_set_type            hw_readl(0xf2008004)              -&amp;gt; 0xf000302c
0x80017e20 gk7101_irq_set_type            hw_writel(0x0, 0xf2008000)        -&amp;gt; 0xf0003024
0x80017e34 gk7101_irq_set_type            hw_writel(0x0, 0xf2008008)        -&amp;gt; 0xf0003028
0x80017e48 gk7101_irq_set_type            hw_writel(0x100000, 0xf2008004)   -&amp;gt; 0xf000302c
0x80017b4c gk7101_ack_irq                 hw_writel(0x100000, 0xf2008038)   -&amp;gt; 0xf0003038
0x80017be4 gk7101_enable_irq              hw_writel(0x100000, 0xf2008010)   -&amp;gt; 0xf0003010
0x80019fa0 get_apb_bus_freq_hz            hw_readl(0xf3170014)              -&amp;gt; 0xf1170000
0x80019fb4 get_apb_bus_freq_hz            hw_readl(0xf3170118)              -&amp;gt; 0xf1170118
0x800177ac gk7101_ce_timer_set_mode       hw_readl(0xf300000c)              -&amp;gt; 0xf100b030
0x800177b8 gk7101_ce_timer_set_mode       hw_writel(0x400, 0xf300000c)      -&amp;gt; 0xf100b030
0x800175e8 gk7101_timer_offset            hw_readl(0xf3000020)              -&amp;gt; 0xf100b020
0x8001764c gk7101_ce_timer_set_mode       hw_readl(0xf300000c)              -&amp;gt; 0xf100b030
0x80017658 gk7101_ce_timer_set_mode       hw_writel(0x400, 0xf300000c)      -&amp;gt; 0xf100b030
0x80019fa0 get_apb_bus_freq_hz            hw_readl(0xf3170014)              -&amp;gt; 0xf1170000
0x80019fb4 get_apb_bus_freq_hz            hw_readl(0xf3170118)              -&amp;gt; 0xf1170118
0x80017678 gk7101_ce_timer_set_mode       hw_writel(0xa8750, 0xf3000020)    -&amp;gt; 0xf100b020
0x8001768c gk7101_ce_timer_set_mode       hw_writel(0xa8750, 0xf3000038)    -&amp;gt; 0xf100b024
0x80019fa0 get_apb_bus_freq_hz            hw_readl(0xf3170014)              -&amp;gt; 0xf1170000
0x80019fb4 get_apb_bus_freq_hz            hw_readl(0xf3170118)              -&amp;gt; 0xf1170118
0x8001770c gk7101_ce_timer_set_mode       hw_writel(0x0, 0xf3000024)        -&amp;gt; 0xf100b028
0x80017720 gk7101_ce_timer_set_mode       hw_writel(0x0, 0xf3000028)        -&amp;gt; 0xf100b02c
0x80017734 gk7101_ce_timer_set_mode       hw_readl(0xf300000c)              -&amp;gt; 0xf100b030
0x80017740 gk7101_ce_timer_set_mode       hw_writel(0x411, 0xf300000c)      -&amp;gt; 0xf100b030
0x80017754 gk7101_ce_timer_set_mode       hw_readl(0xf300000c)              -&amp;gt; 0xf100b030
0x80017760 gk7101_ce_timer_set_mode       hw_writel(0x401, 0xf300000c)      -&amp;gt; 0xf100b030
0x80017774 gk7101_ce_timer_set_mode       hw_readl(0xf300000c)              -&amp;gt; 0xf100b030
0x80017780 gk7101_ce_timer_set_mode       hw_writel(0x501, 0xf300000c)      -&amp;gt; 0xf100b030
0x801dce78 serial_gk7101_set_termios      hw_writel(0x10, 0xf3005018)       -&amp;gt; 0xf100500c
0x801dced8 serial_gk7101_set_termios      hw_writel(0xd, 0xf3005004)        -&amp;gt; 0xf1005000
0x801dceec serial_gk7101_set_termios      hw_writel(0x0, 0xf3005000)        -&amp;gt; 0xf1005004
0x801dcf04 serial_gk7101_set_termios      hw_writel(0x3, 0xf3005018)        -&amp;gt; 0xf100500c
0x801dcf48 serial_gk7101_set_termios      hw_readl(0xf3005000)              -&amp;gt; 0xf1005004
0x801dcf54 serial_gk7101_set_termios      hw_writel(0x5, 0xf3005000)        -&amp;gt; 0xf1005004
0x801dc004 serial_gk7101_set_mctrl        hw_readl(0xf300500c)              -&amp;gt; 0xf1005010
0x801dc07c serial_gk7101_set_mctrl        hw_writel(0x4, 0xf300500c)        -&amp;gt; 0xf1005010
0x801dc1ec serial_gk7101_console_write    hw_readl(0xf3005000)              -&amp;gt; 0xf1005004
0x801dc204 serial_gk7101_console_write    hw_writel(0x5, 0xf3005000)        -&amp;gt; 0xf1005004
0x801dc158 serial_gk7101_console_putchar  hw_readl(0xf3005014)              -&amp;gt; 0xf1005014
0x801dc17c serial_gk7101_console_putchar  hw_writel(0x5b, 0xf3005004)       -&amp;gt; 0xf1005000
0x801dc158 serial_gk7101_console_putchar  hw_readl(0xf3005014)              -&amp;gt; 0xf1005014
0x801dc158 serial_gk7101_console_putchar  hw_readl(0xf3005014)              -&amp;gt; 0xf1005014
0x801dc17c serial_gk7101_console_putchar  hw_writel(0x20, 0xf3005004)       -&amp;gt; 0xf1005000
0x801dc158 serial_gk7101_console_putchar  hw_readl(0xf3005014)              -&amp;gt; 0xf1005014
0x801dc158 serial_gk7101_console_putchar  hw_readl(0xf3005014)              -&amp;gt; 0xf1005014
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That first call to &lt;code&gt;get_version()&lt;/code&gt; is used to print the
&amp;ldquo;&lt;code&gt;hal version = 20151223&lt;/code&gt;&amp;rdquo; line in the console log we saw earlier. After that
a bunch of IRQ initialization stuff, followed by timer initialization.&lt;/p&gt;
&lt;p&gt;After that it&amp;rsquo;s some console initialization output; the putchar calls are
the first real kernel console output. So that&amp;rsquo;s all as expected;
as a matter of fact that matches exactly what my interrupt and timer drivers
put into the registers. Yet the vendor kernel makes it through the calibration
function. It&amp;rsquo;s not long before the log starts showing this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0x8000845c gk7101_vic_handle_irq          hw_readl(0xf2008030)              -&amp;gt; 0
0x80017b4c gk7101_ack_irq                 hw_writel(0x100000, 0xf2008038)   -&amp;gt; 0
0x800175a0 gk7101_ce_timer_interrupt      hw_writel(0x100000, 0xf2008038)   -&amp;gt; 0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That&amp;rsquo;s a timer interrupt coming in, getting ack&amp;rsquo;ed, and getting handled.
It just never happens in my kernel.&lt;/p&gt;
&lt;p&gt;And so we&amp;rsquo;re stuck again!&lt;/p&gt;
</description>
    </item>
    <item>
      <title>Emulate Your Way to Success</title>
      <link>https://biot.com/blog/posts/emulate-your-way-to-success/</link>
      <pubDate>Sat, 16 Nov 2019 19:59:01 +0100</pubDate>
      <guid>https://biot.com/blog/posts/emulate-your-way-to-success/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://biot.com/blog/posts/hal-of-horrors/&#34;&gt;last episode&lt;/a&gt; of unlocking the Goke GK7101
SoC, we found ourselves faced with a big obstacle: a
&lt;a href=&#34;https://en.wikipedia.org/wiki/Hardware_abstraction&#34;&gt;HAL&lt;/a&gt;
layer in the form of I/O read/write calls that translated on-board
peripherals&amp;rsquo; register locations to their real addresses. The HAL&amp;rsquo;s underlying
code is convoluted and much too hard to parse &amp;ndash; it&amp;rsquo;s a large maze of
twisty little if-then-elses, all alike. And since this SoC has tons of
functionality, there are hundreds of register addresses to find.&lt;/p&gt;
&lt;p&gt;But then, a surprise: a wild SDK appears! Somebody uploaded an official
Goke SDK tarball to a certain open source repository site, and it has tons
of code: the full Linux kernel as hacked up by Goke, their U-Boot source,
a build system to make a full working system including root filesystem, and
even some example applications that use their kernel drivers.&lt;/p&gt;
&lt;p&gt;Weirdly, &lt;em&gt;all&lt;/em&gt; of it is marked either GPL or, in a very few cases, public
domain. You have to wonder why they&amp;rsquo;re not just putting this thing up for
download; it&amp;rsquo;s literally all they have to do to abide by the GPL.&lt;/p&gt;
&lt;p&gt;Sure enough, lots of information about the HAL is to be found. The struct
with the read and write calls is called &lt;code&gt;hw_ops&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;struct hw_ops
{
    int (*get_version)(void);
    unsigned int (*reserved)(unsigned int );

    unsigned char (*hw_readb)(unsigned int );
    unsigned short (*hw_readw)(unsigned int );
    unsigned int (*hw_readl)(unsigned int );

    void (*hw_writeb)(unsigned char , unsigned int );
    void (*hw_writew)(unsigned short , unsigned int );
    void (*hw_writel)(unsigned int , unsigned int );

    unsigned int (*flash_read)(void);
    void (*flash_write)(unsigned int);

    unsigned char (*usb_readb)(unsigned int ptr, unsigned int offset);
    unsigned short (*usb_readw)(unsigned int ptr, unsigned int offset);
    unsigned int (*usb_readl)(unsigned int ptr, unsigned int offset);
    void (*usb_writeb)(unsigned int ptr, unsigned int offset, unsigned char value);
    void (*usb_writew)(unsigned int ptr, unsigned int offset, unsigned short value);
    void (*usb_writel)(unsigned int ptr, unsigned int offset, unsigned int value);

    unsigned int (*dma_readl)(unsigned int ptr);
    void (*dma_writel)(unsigned int ptr, unsigned int value);

#if SPI_API_MODE
    unsigned char (*spi_readb)(unsigned int ptr);
    unsigned short (*spi_readw)(unsigned int ptr);
    unsigned int (*spi_readl)(unsigned int ptr);
    void (*spi_writeb)(unsigned int ptr, unsigned char value);
    void (*spi_writew)(unsigned int ptr, unsigned short value);
    void (*spi_writel)(unsigned int ptr, unsigned int value);
#endif
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Keeping in mind that all these are 32-bit function pointers, our &lt;code&gt;0x10&lt;/code&gt; and
&lt;code&gt;0x1c&lt;/code&gt; offsets correspond to &lt;code&gt;hw_readl&lt;/code&gt; and &lt;code&gt;hw_writel&lt;/code&gt;, respectively.&lt;/p&gt;
&lt;p&gt;The first function, &lt;code&gt;get_version&lt;/code&gt;, looks like this in Ghidra:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;                ********************************************
                *                 FUNCTION                 *
                ********************************************
                uint32_t __stdcall get_version(void)
     uint32_t     r0:4      &amp;lt;RETURN&amp;gt;
                get_version                       XREF[2]: hal_init:c00120b0(*), 
                                                           c0012160(*)  
c00121a8  ldr   r0, [DAT_c00121b0]                                       = 20151223h
c00121ac  bx    lr
                DAT_c00121b0                      XREF[1]: get_version:c00121a8(R
c00121b0  unde  20151223h
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That simply returns the number &lt;code&gt;0x20151223&lt;/code&gt;. We&amp;rsquo;ve seen that before, in the
vendor kernel&amp;rsquo;s boot log:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[    0.000000] hal version = 20151223 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What about the source to the actual HAL code? That gets put into memory by
U-Boot, where the kernel then fences it off with the MMU, and uses it in-place.
The U-Boot source has this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;static inline unsigned int gk_hal_init (int cp_flag)
{
    unsigned int rval = 0;
    unsigned int *haladdress;

    haladdress = (unsigned int *)CONFIG_GK_HAL_ADDR;
    *(volatile u32 *) (CONFIG_U2K_HAL_ADDR) = (u32)haladdress;
    if(1==cp_flag)
    memcpy(haladdress,hal_data,sizeof(hal_data));

    hal_function_t hal_init = (hal_function_t) (haladdress) ;

    g_hw = (struct hw_ops *)hal_init (0, 0, 0x90000000, 0xA0000000, 0) ;

    return rval ;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&amp;rsquo;s copied from &lt;code&gt;hal_data&lt;/code&gt; to &lt;code&gt;CONFIG_GK_HAL_ADDR&lt;/code&gt;, which is defined to
&lt;code&gt;0xc0012000&lt;/code&gt;. So what&amp;rsquo;s &lt;code&gt;hal_data&lt;/code&gt;?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;int hal_data[]={
0xe92d45f8,0xe1a04000,0xe1a05001,0xe1a06002,0xe1a0a003,0xe59d7020,0xe3570000,0x0a00000e,
0xe59f8148,0xe58802f0,0xe5881320,0xe59832f8,0xe3530000,0x0a000002,0xe2880e2f,0xe3a01001,
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For whatever reason, the HAL source is missing. Nice to have their kernel
source, but we&amp;rsquo;re no closer to solving this massive HAL problem.&lt;/p&gt;
&lt;p&gt;Breaking it down, getting the real register locations involves:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;identify &amp;ldquo;fake&amp;rdquo; register address.&lt;/li&gt;
&lt;li&gt;call the HAL&amp;rsquo;s &lt;code&gt;hw_readl()&lt;/code&gt; function.&lt;/li&gt;
&lt;li&gt;find out what that function translated the address to.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now that we have source code, it&amp;rsquo;s at least gotten a lot easier to get a list
of fake registers &amp;ndash; and names to describe them, which is very nice to have.
Here&amp;rsquo;s the UART definition:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#define UART_RB_OFFSET          0x04
#define UART_TH_OFFSET          0x04
#define UART_DLL_OFFSET         0x04
#define UART_IE_OFFSET          0x00
#define UART_DLH_OFFSET         0x00
#define UART_II_OFFSET          0x08
#define UART_FC_OFFSET          0x08
#define UART_LC_OFFSET          0x18
#define UART_MC_OFFSET          0x0c
#define UART_LS_OFFSET          0x14
#define UART_MS_OFFSET          0x10
#define UART_SC_OFFSET          0x1c    /* Byte */
#define UART_SRR_OFFSET         0x88
[...]
/* UART[x]_LS_REG */
#define UART_LS_FERR            0x80
#define UART_LS_TEMT            0x40
#define UART_LS_BI              0x20
#define UART_LS_FE              0x10
#define UART_LS_THRE            0x08
#define UART_LS_DR              0x04
#define UART_LS_PE              0x02
#define UART_LS_OE              0x01
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That matches the code in the vendor kernel&amp;rsquo;s decompressor exactly, including
the LSR&amp;rsquo;s DR bit being at bit 2.&lt;/p&gt;
&lt;p&gt;But how can we call &lt;code&gt;hw_readl()&lt;/code&gt; for all of those registers, and get the
results back in a way we can use in our code? We have the HAL code, but
it&amp;rsquo;s not like we can run the HAL without the GK7101&amp;hellip; or can we?&lt;/p&gt;
&lt;p&gt;It would actually be possible to run the code on any other ARM core sharing
that &lt;a href=&#34;https://en.wikipedia.org/wiki/Instruction_set_architecture&#34;&gt;ISA&lt;/a&gt;, if
not for the fact that it interacts with actual memory locations: these
HAL functions don&amp;rsquo;t just return the locations &amp;ndash; they perform the actual
reads and writes. So what we need is an ARM platform that will let us call
the function but abort before doing the actual read operation.&lt;/p&gt;
&lt;p&gt;Of course an architecture with an MMU makes that easy: by simply not mapping
any memory except the part that runs the code, memory accesses to these
unmapped locations generate an exception &amp;ndash; that is, they run the instruction
located at the ARM vector called Data Abort.&lt;/p&gt;
&lt;p&gt;This would be even easier to do without having to mess with actual exception
vectors on raw hardware, with custom handlers that somehow log what happened.
What we need is something like QEMU &amp;ndash; which supports ARM &amp;ndash; to start up a
little VM that only loads the HAL code, calls one function in it, and reports
in some detail about a Data Abort exception. That&amp;rsquo;s actually a bit messy for
QEMU; it&amp;rsquo;s intended for a much higher level of control. We need something
lower-level than QEMU, but that&amp;rsquo;s somehow still easy to use and control.
That seems like a big ask &amp;ndash; would something like that even exist?&lt;/p&gt;
&lt;p&gt;Enter the magnificent &lt;a href=&#34;https://www.unicorn-engine.org/&#34;&gt;Unicorn&lt;/a&gt;. The
authors took the basic low-level VM code in QEMU, and built a very
different thing around it. Unicorn takes the form of a library, with an API
that lets you create a VM, map memory into it, set callbacks to your code
for exceptions, run code with a granularity down to one instruction, and
read/write registers. It&amp;rsquo;s utterly easy to use, well documented, and &lt;em&gt;exactly&lt;/em&gt;
what we need. The API is in C, but has bindings for tons of languages, notably
Python.&lt;/p&gt;
&lt;p&gt;What we need to do, in order:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create an ARM VM.&lt;/li&gt;
&lt;li&gt;Map some memory into it, obviously not in the region where I/O might be.
We&amp;rsquo;ll use &lt;code&gt;0xc0000000&lt;/code&gt;, since the HAL lives in that region.&lt;/li&gt;
&lt;li&gt;Assign some of that memory to the stack, and set the virtual CPU&amp;rsquo;s stack
pointer (the &lt;code&gt;sp&lt;/code&gt; register).&lt;/li&gt;
&lt;li&gt;Drop the HAL code in there, at address &lt;code&gt;0xc0012000&lt;/code&gt; &amp;ndash; we have no
guarantees that all that code is relocatable, so this will avoid problems.&lt;/li&gt;
&lt;li&gt;Set up a callback for the Data Abort exception: the exception called when
code tries to access memory the MMU doesn&amp;rsquo;t have a mapping for. We&amp;rsquo;ll just
print the &amp;ldquo;bad&amp;rdquo; address from there, and that&amp;rsquo;ll give us the final address
as determined by the HAL.&lt;/li&gt;
&lt;li&gt;Call the &lt;code&gt;hal_init()&lt;/code&gt; function &amp;ndash; recall it sets up some variables used by
the address translation stuff later.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We&amp;rsquo;ll also need a little bit of code to call &lt;code&gt;hw_readl()&lt;/code&gt; with a supplied
address we want translated. Something like this should do it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ldr r4, [pc, #4]	// load hw_readl() pointer
ldr r0, [pc, #4]	// load argument to hw_readl()
blx r4				// call hw_readl()
// --- code ends here ---
.long 0		// pointer to hw_readl() goes here
.long 0		// test address goes here
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Unicorn application is
&lt;a href=&#34;https://github.com/biot/gk710x-tools/blob/master/emu-hal.c&#34;&gt;here&lt;/a&gt;.
It&amp;rsquo;s a bit messy, with lots of debugging code, but it works. You give it
a HAL code blob as extracted from the device and a test address:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sumner: ./emu-hal halcode-fromdevice 0xa0005014
0xa0005014 -&amp;gt; 0x70005014
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using the UART register offsets we found in the source code, we can thus
map the whole UART block:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sumner: for reg in 04 00 08 18 0c 14 10 1c 88; do ./emu-hal halcode-fromdevice 0xa00050$reg; done
0xa0005004 -&amp;gt; 0x70005000
0xa0005000 -&amp;gt; 0x70005004
0xa0005008 -&amp;gt; 0x70005008
0xa0005018 -&amp;gt; 0x7000500c
0xa000500c -&amp;gt; 0x70005010
0xa0005014 -&amp;gt; 0x70005014
0xa0005010 -&amp;gt; 0x70005018
0xa000501c -&amp;gt; 0x7000501c
0xa0005088 -&amp;gt; 0x70005088
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Success! We&amp;rsquo;ll need to extract register offsets for all subsystems from the
source code, but that&amp;rsquo;s a luxury problem: a bit of parsing with a Python
script is peanuts compared to having to extract them from decompiled
drivers.&lt;/p&gt;
&lt;p&gt;Next up, we&amp;rsquo;ll figure out the registers for the interrupt and timer
subsystems, so we can get a kernel to boot. We can use the &lt;code&gt;early_print()&lt;/code&gt;
facility to debug that, since a proper UART driver won&amp;rsquo;t work until
we have interrupts.&lt;/p&gt;
</description>
    </item>
    <item>
      <title>Hal of Horrors</title>
      <link>https://biot.com/blog/posts/hal-of-horrors/</link>
      <pubDate>Fri, 15 Nov 2019 21:13:38 +0100</pubDate>
      <guid>https://biot.com/blog/posts/hal-of-horrors/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://biot.com/blog/posts/getting-started-on-a-new-board/&#34;&gt;previous post&lt;/a&gt; we found the GK7101
SoC&amp;rsquo;s UART base address and a few registers by decompiling their version of
Linux&amp;rsquo;s decompressor:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;UART base address is &lt;code&gt;0xa0005000&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The input/output register is at offset &lt;code&gt;0x04&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Offset 0x14 holds flags:
&lt;ul&gt;
&lt;li&gt;Bit 6 needs to be high before sending.&lt;/li&gt;
&lt;li&gt;To drain the input buffer, read from offset &lt;code&gt;0x04&lt;/code&gt; until bit 2 goes low.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This all looks suspiciously like a standard &lt;a href=&#34;https://en.wikipedia.org/wiki/16550_UART&#34;&gt;16550
UART&lt;/a&gt;, but not quite: offset &lt;code&gt;0x04&lt;/code&gt;
corresponds to the 16550&amp;rsquo;s RHR/THR register (Read/Transmit Holding Register),
except the 16550 has it on offset &lt;code&gt;0x00&lt;/code&gt;. Offset &lt;code&gt;0x14&lt;/code&gt; matches the LSR (Line
Status Register), with bit 6 matching TEMT (Transmitter Empty), but bit 2
doesn&amp;rsquo;t quite match &amp;ndash; it&amp;rsquo;s the equivalent of bit 0 in the LSR.&lt;/p&gt;
&lt;p&gt;So it&amp;rsquo;s not a 16550-compatible UART, just a vague attempt at one. Here&amp;rsquo;s
my version of the 4 macros to work this thing:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#define UART0_TH		0x04
#define UART0_LS		0x14
#define UART0_LS_DR		(1 &amp;lt;&amp;lt; 2)
#define UART0_LS_TEMT	(1 &amp;lt;&amp;lt; 6)


#ifdef CONFIG_DEBUG_UART_PHYS
        .macro	addruart, rp, rv, tmp
        ldr	\rp, =CONFIG_DEBUG_UART_PHYS
        ldr	\rv, =CONFIG_DEBUG_UART_VIRT
        .endm
#endif

        .macro	waituart,rd,rx
1001:		ldr	\rd, [\rx, #UART0_LS]
        tst	\rd, #UART0_LS_TEMT
        beq	1001b
        .endm

        .macro	senduart,rd,rx
        str	\rd, [\rx, #UART0_TH]
        .endm

        .macro	busyuart,rd,rx
1001:		ldr	\rd, [\rx, #UART0_LS]
        tst	\rd, #UART0_LS_DR
        bne	1001b
        .endm
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To use this low-level console facility, you need to define the following in
your kernel&amp;rsquo;s configuration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;CONFIG_DEBUG_LL=y
CONFIG_DEBUG_UNCOMPRESS=y
CONFIG_DEBUG_LL_INCLUDE=&amp;quot;debug/gk710x.S&amp;quot;
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_UART_PHYS=0xA0005000
CONFIG_DEBUG_UART_VIRT=0xf3005000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That turns on the facility, enables it in the decompressor, and tells the
build system where to find the code. You&amp;rsquo;re supposed to create that file
and define the macros in there. &lt;code&gt;CONFIG_EARLY_PRINTK&lt;/code&gt; creates a function called
&lt;code&gt;early_print()&lt;/code&gt;, which uses these macros as well. Handy to have while you&amp;rsquo;re
debugging your real UART driver.&lt;/p&gt;
&lt;p&gt;The last two definitions are used in the &lt;code&gt;addruart&lt;/code&gt; macro above, defining
the UART base address in physical and virtual (post-MMU) memory. How did I
come up with the virtual address? Remember this output from the vendor kernel
bootup?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[    0.000000] AHB: 0x90000000  0xf2000000  -- 0x1000000
[    0.000000] APB: 0xa0000000  0xf3000000  -- 0x1000000
[    0.000000] PPM: 0xc0000000  0xc0000000  -- 0x200000
[    0.000000] BSB: 0xc4800000  0xf5000000  -- 0x200000
[    0.000000] DSP: 0xc4a00000  0xf6000000  -- 0x3600000
[    0.000000] hal version = 20151223 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That looks to me like the
&lt;a href=&#34;https://en.wikipedia.org/wiki/Memory-mapped_I/O&#34;&gt;MMIO&lt;/a&gt; mapping: first the
physical address, then the virtual address it gets mapped to, followed by the
length of the block. APB refers to the Advanced Peripheral Bus, a standard
facility on ARM chips. It&amp;rsquo;s a memory-mapped I/O bus for SoC designers to
hook low-speed peripherals to, exactly where you&amp;rsquo;d expect to find a UART.
The virtual address for the UART block within the APB is this &lt;code&gt;0xf3005000&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That should be all that&amp;rsquo;s needed to get a mainline kernel&amp;rsquo;s decompressor
output working on this chip.&lt;/p&gt;
&lt;p&gt;And yet&amp;hellip; it doesn&amp;rsquo;t work. The decompressor outputs nothing. Diddling
those UART registers from the U-Boot shell (which has basic memory monitor
commands) also does nothing.&lt;/p&gt;
&lt;p&gt;But the vendor kernel didn&amp;rsquo;t really write to those addresses directly, did it?
It used that weird struct with the read and write function pointers, feeding
it the addresses we found &amp;ndash; and that somehow did work.&lt;/p&gt;
&lt;p&gt;This is some sort of abstraction between I/O reads/writes and the actual
hardware. In other words, it&amp;rsquo;s a HAL (Hardware Abstraction Layer). Hardware
companies often stick these between their drivers and the operating system
they have to run on, in the hope it will save them time and effort. The idea
is they can write one driver, and just have a HAL abstract out whether that
driver needs to talk to Linux or Windows, for example.&lt;/p&gt;
&lt;p&gt;The Linux kernel &lt;em&gt;never&lt;/em&gt; accepts drivers that come with a HAL. It makes for
an unnecessary internal API &amp;ndash; the HAL layer &amp;ndash; that nothing else will use,
and it creates a lot of unnecessary code. Anyone that wants to mainline code
based on a vendor HAL typically gets to rewrite the whole thing, using the
vendor code as nothing more than a reference to registers and such. Of
course, vendor code even without a HAL &lt;em&gt;also&lt;/em&gt; tends to get rewritten from
scratch before it goes into mainline, because it&amp;rsquo;s a pile of dung. But I
digress.&lt;/p&gt;
&lt;p&gt;Incidentally, not every operating system refuses HAL-based code: the
&lt;a href=&#34;https://www.zephyrproject.org/&#34;&gt;Zephyr&lt;/a&gt; project
&lt;a href=&#34;https://www.zephyrproject.org/zephyr-leverages-hals-to-accelerate-viability/&#34;&gt;happily accepts&lt;/a&gt;
such code, for all the wrong reasons. That&amp;rsquo;s because it&amp;rsquo;s a project run by
a consortium of competitors, managed by the Linux Foundation. In other words
there isn&amp;rsquo;t a clue to be found. This is why we need Linus, folks.&lt;/p&gt;
&lt;p&gt;Still, this particular HAL seems unusual &amp;ndash; it doesn&amp;rsquo;t sit between a driver
and the OS, but between I/O reads/writes and the registers!&lt;/p&gt;
&lt;p&gt;Ghidra didn&amp;rsquo;t quite manage to extract the address of that struct, and it was
quite the struggle to find, but it turns out to live at &lt;code&gt;0xc0015ad0&lt;/code&gt;.
U-Boot has a severe misfeature in that asking it to dump memory in 32-bit-sized
chunks will make it flip those chunks from little-endian to big-endian. However
this comes in handy if you&amp;rsquo;re on a wrong-endian system and you happen to be
human. Here&amp;rsquo;s the struct:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;GK7101 # md.l c0015ad0 20
[PROCESS_SEPARATORS] md.l c0015ad0 20
c0015ad0: c00121a8 c0012904 c0012b80 c0012b0c    .!...)...+...+..
c0015ae0: c00127e8 c0012a90 c0012a14 c001295c    .&#39;...*...*..\)..
c0015af0: c0012250 c0012268 c0012280 c00122a0    P&amp;quot;..h&amp;quot;...&amp;quot;...&amp;quot;..
c0015b00: c00122c0 c00122e0 c0012300 c0012320    .&amp;quot;...&amp;quot;...#.. #..
c0015b10: c0012340 c001235c c0015ad0 00042400    @#..\#...Z...$..
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Those are clearly all pointers to the same region, roughly &lt;code&gt;0xc0012000&lt;/code&gt; -
&lt;code&gt;0xc0015000&lt;/code&gt;.
We saw the read function at offset &lt;code&gt;0x10&lt;/code&gt;, and write at &lt;code&gt;0x1c&lt;/code&gt;. Let&amp;rsquo;s take
a look at read, at &lt;code&gt;0xc00127e8&lt;/code&gt;. Remember it takes a single 32-bit unsigned
integer argument (the address) and returns the value at that address, also
&lt;code&gt;uint32_t&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uint hw_readl(uint *address)

{
    undefined *puVar1;
    uint *puVar2;
    uint uVar3;
    code *pcVar4;
    uint *puVar5;
    undefined4 local_14;
	
    if (param_1 == DAT_c00128ec) {
LAB_c0012840:
        uVar3 = *(uint *)(*DAT_c0012900 + 0x5014);
        return uVar3 &amp;amp; 0xffffffc0 | 8 | (uVar3 &amp;amp; 6) &amp;gt;&amp;gt; 1 | (uVar3 &amp;amp; 1) &amp;lt;&amp;lt; 2 | (uVar3 &amp;amp; 0x18) &amp;lt;&amp;lt; 1;
    }
    if (param_1 &amp;lt; DAT_c00128ec || param_1 == DAT_c00128ec) {
        if (param_1 != DAT_c00128f0) {
LAB_c0012874:
            local_14 = 0;
            puVar5 = param_1;
            puVar1 = FUN_c00127b8((uint)param_1);
            if (puVar1 == NULL) {
                if (((uint)param_1 &amp;amp; 0xff000000) == 0x60000000 ||
                    ((uint)param_1 &amp;amp; 0xff000000) == 0x70000000) {
                    return 0;
                }
                return *param_1;
            }
            puVar2 = (uint *)(**(code **)(puVar1 + 4))(param_1,&amp;amp;local_14);
            if (puVar2 == NULL) {
                return 0;
            }
            pcVar4 = *(code **)(puVar1 + 0x10);
            if (pcVar4 == NULL) {
                return *puVar2;
            }
            uVar3 = (*pcVar4)(local_14,*puVar2,0,pcVar4,puVar5);
            return uVar3;
        }
    }
    else {
        if (param_1 != DAT_c00128f4) {
            if (param_1 == DAT_c00128f8) goto LAB_c0012840;
            goto LAB_c0012874;
        }
    }
    return *(uint *)(*DAT_c00128fc + 0x16000);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is very much &lt;a href=&#34;https://en.wikipedia.org/wiki/Spaghetti_code&#34;&gt;spaghetti
code&lt;/a&gt;, no doubt at least partly
due to the decompiler&amp;rsquo;s distorted view: the machine code it translates is
&lt;em&gt;always&lt;/em&gt; spaghetti-shaped. Translated into proper C and then into pseudo-code
it does this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If the address is &lt;code&gt;0xA016018C&lt;/code&gt; or &lt;code&gt;0xF316018C&lt;/code&gt;, return &lt;code&gt;0x00474b37&lt;/code&gt;. That&amp;rsquo;s
actually the string &amp;ldquo;7KG&amp;rdquo;. Similarly, if the address is &lt;code&gt;0xA0160188&lt;/code&gt; or
&lt;code&gt;0xF3160188&lt;/code&gt;, return &lt;code&gt;0x00474b37&lt;/code&gt; &amp;ndash; the string &amp;ldquo;1010&amp;rdquo;. Note the &lt;code&gt;0xa0000000&lt;/code&gt;
and &lt;code&gt;0xf3000000&lt;/code&gt; address ranges here &amp;ndash; so these are definitely related, and this
code is apparently meant to work on both physical and virtual addresses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the address is &lt;code&gt;0xA0005014&lt;/code&gt; or &lt;code&gt;0xF3005014&lt;/code&gt; &amp;ndash; hey, that&amp;rsquo;s the UART LSR!
&amp;ndash; then read from an unknown prefix + &lt;code&gt;0x5014&lt;/code&gt; and shift the various
bits in there around to match the almost-not-quite-16550 we found earlier.
The unknown prefix turns out to be set by an earlier call to &lt;code&gt;0xc0012000&lt;/code&gt;, some
sort of initialization of this HAL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Any other address, a function at &lt;code&gt;0xc00127b8&lt;/code&gt; is called with the address,
and a number of some sort is returned. That number is used as an index into
a jumptable of functions; the indexed function is then called with the address
and returns the &amp;ldquo;real&amp;rdquo; address. The read is then done on that address, and the
result returned.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So all we have to do is figure out which index we get for our &lt;code&gt;0xa0005000&lt;/code&gt;
registers, decompile the function that handles this block, and see what
it returns. The real physical address for the UART block turns out to be
&lt;code&gt;0x70005000&lt;/code&gt;. The flags of the LSR register at &lt;code&gt;0x70005014&lt;/code&gt; are totally
different, but the code up there shows the mapping. Oh, and the &lt;em&gt;real&lt;/em&gt;
RHR/THR register is actually at offset &lt;code&gt;0x00&lt;/code&gt; &amp;ndash; so this HAL doesn&amp;rsquo;t just
translate peripheral base addresses, but also the ordering of registers
within their blocks.&lt;/p&gt;
&lt;p&gt;The problem is that this is a big jumptable, with a large number of functions.
They&amp;rsquo;re all constructed as a series of if-then-else statements on the address,
no doubt in an effort to make it all go fast. Clearly, this was written before
tree traversal algorithms were invented.&lt;/p&gt;
&lt;p&gt;The upshot is that it&amp;rsquo;s 16K of really dense machine code, a serious chore
to decompile, parse and grok &amp;ndash; it would just takes ages, and mistakes
would be easy to make. Yet I have to do all of them, or I won&amp;rsquo;t have the
real addresses of the SoC&amp;rsquo;s various features.&lt;/p&gt;
&lt;p&gt;So with the physical address of &lt;code&gt;0x70005000&lt;/code&gt; the UART works. That&amp;rsquo;s one problem
solved, but now I&amp;rsquo;ve got hundreds of problems just like it.&lt;/p&gt;
&lt;p&gt;This is horrible! What to do?&lt;/p&gt;
</description>
    </item>
    <item>
      <title>Getting Started on a New Board</title>
      <link>https://biot.com/blog/posts/getting-started-on-a-new-board/</link>
      <pubDate>Wed, 13 Nov 2019 12:39:48 +0100</pubDate>
      <guid>https://biot.com/blog/posts/getting-started-on-a-new-board/</guid>
      <description>&lt;p&gt;In &lt;a href=&#34;https://biot.com/blog/posts/this-little-camera/&#34;&gt;part 1&lt;/a&gt; of this series we found an interesting little
board, and located the built-in UART pins. Let&amp;rsquo;s take a look at the output on
boot:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;console init done


U-Boot 2012.10 (Dec 07 2016 - 13:48:53) for GK7101 rb imx222 v1.00 (GOKE)

HAL:  20151223 
DRAM:  128 MiB
Flash: 16 MiB
16 MiB
NAND:  SPINAND MID = 0xff, DID = 0xffff, Data = 0x1ffffff !spinand_board_init[1581]: No support this SPI nand!
SF: Detected GD25Q128C with page size 256 B, sector size 64 KiB, total size 16 MiB
In:    serial
Out:   serial
Err:   serial
Net:   arm_freq(600MHz)..............0x112032
use int MII..............
gk7101
Hit any key to stop autoboot:  5 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Lots of useful information there. It&amp;rsquo;s an ancient version of &lt;a href=&#34;https://www.denx.de/wiki/U-Boot&#34;&gt;Das
U-Boot&lt;/a&gt;, the most commonly used bootloader
for embedded Linux systems. Sure enough, it&amp;rsquo;s running Linux:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;put param to memory
mem size (70)
bsb size (2)

the kernel image is zImage or Image
entry = 0xc1000000 
## Transferring control to Linux (at address c1000000)...

Starting kernel ...

machid = 3988 r2 = 0xc0000100 
Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0
[    0.000000] Linux version 3.4.43-gk (root@ubuntu) (gcc version 4.6.1 (crosstool-NG 1.18.0) ) #92 PREEMPT Wed Dec 7 16:55:36 CST 2016
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: VIPT aliasing data cache, VIPT aliasing instruction cache
[    0.000000] Machine: Goke GK7101 RB_IMX222 board V1.00
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] AHB: 0x90000000  0xf2000000  -- 0x1000000
[    0.000000] APB: 0xa0000000  0xf3000000  -- 0x1000000
[    0.000000] PPM: 0xc0000000  0xc0000000  -- 0x200000
[    0.000000] BSB: 0xc4800000  0xf5000000  -- 0x200000
[    0.000000] DSP: 0xc4a00000  0xf6000000  -- 0x3600000
[    0.000000] hal version = 20151223 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 17780
[    0.000000] Kernel command line: console=ttySGK0,115200 noinitrd mem=70M rw mtdparts=gk7101_flash:256K(boot),64K(bootenv),2560K(kernel),7168K(rootfs),1024K(rom),5312K(APP) rootfstype=squashfs root=/dev/mtdblock3 init=linuxrc ip=192.168.1.254:192.168.1.112:192.168.1.1:255.255.255.0:&amp;quot;gk7101&amp;quot;:eth0 mac=3C:97:0E:22:E1:14 phytype=0
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The system boots right into a &lt;a href=&#34;https://www.busybox.net/&#34;&gt;BusyBox&lt;/a&gt; shell.
Since the system has a microSD slot, the easiest way to take a good look at the
firmware is to stick in a microSD card and just copy the built-in storage over
to it. This way we can examine the goods with a much better set of tools than
will be available on the camera.&lt;/p&gt;
&lt;p&gt;The board has an SPI flash chip on it:&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&#34;https://biot.com/blog/blog/gk7101-flash.jpg&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;GigaDevice 16MB SPI flash&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;This isn&amp;rsquo;t like a hard disk or SSD, with a partition table of some sort to
help the system figure out how the storage is structured. The Linux kernel
needs to be told about the partitions. This version of U-Boot has this hardcoded,
and it passes it along to the Linux kernel as part of the command line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mtdparts=gk7101_flash:256K(boot),64K(bootenv),2560K(kernel),7168K(rootfs),1024K(rom),5312K(APP)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These partitions show up as &lt;code&gt;/dev/mtdblock0&lt;/code&gt;, &lt;code&gt;/dev/mtdblock1&lt;/code&gt; and so on.
Das U-Boot is on the first partition, its configuration in the second,
Linux kernel on the next one followed by the root filesystem. The last
partition (&amp;ldquo;APP&amp;rdquo;) contains the main application the camera runs, and the &amp;ldquo;ROM&amp;rdquo;
partition contains configuration.&lt;/p&gt;
&lt;p&gt;The mainline kernel has support for the basic ARM platform the GK7101 SoC uses,
ARM1176: Selecting &lt;code&gt;CONFIG_ARCH_MULTI_V6&lt;/code&gt; gets us working code, but of course
no I/O of any kind. The first thing the kernel outputs when U-Boot calls it
is normally the decompressor. It prints &amp;ldquo;&lt;code&gt;Uncompressing Linux... done, booting the kernel.&lt;/code&gt;&amp;rdquo; to the console, then jumps into the kernel proper, at its new
location. The console, of course, is not so easy here: there is no BIOS that
will output this on a VGA port, like in X86. Instead, Linux has a very
low-level facility to have the decompressor use a memory-mapped UART as an
output-only console.&lt;/p&gt;
&lt;p&gt;Normally, driving a UART would entail having an interrupt fire when e.g. a byte
has arrived on the UART. At decompression time, however, the CPU has no
interrupts, timers, or even MMU: U-Boot disables all that before handover.
Therefore the decompressor&amp;rsquo;s UART driver can&amp;rsquo;t just write characters to a
buffer, knowing that some timer will fire and empty the buffer into the UART
registers as it clears itself. Instead, the driver has to check if the UART&amp;rsquo;s
output register is clear, and only then write a single character.&lt;/p&gt;
&lt;p&gt;This facility can also be used by the kernel once it&amp;rsquo;s up and running, but by
that time it will have a working MMU &amp;ndash; and the I/O registers for the UART
may have moved to a different address. The kernel solves all of this by
allowing you to define 4 macros for your platform, and it uses these to
construct a &lt;code&gt;putc()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;From &lt;code&gt;arch/arm/boot/compressed/debug.S&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ENTRY(putc)
    addruart r1, r2, r3
    waituart r3, r1
    senduart r0, r1
    busyuart r3, r1
    mov  pc, lr
ENDPROC(putc)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;addruart&lt;/code&gt; macro should load the base address for the UART registers &amp;ndash;
the physical address in r1, and the corresponding virtual address in r2 (r3 is
just a scratch register it can use).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;waituart&lt;/code&gt; should implement a busy loop waiting for the UART&amp;rsquo;s line status
register to show the output register is empty. &lt;code&gt;senduart&lt;/code&gt; puts its first
argument into the output register, and &lt;code&gt;busyuart&lt;/code&gt; again should implement a
busy loop waiting for output to clear &amp;ndash; a flush() function.&lt;/p&gt;
&lt;p&gt;Note how &lt;code&gt;addruart&lt;/code&gt; loads the physical address into r1, and the other macros
then use r1. This is specific to the decompressor, since it runs pre-MMU. The
kernel facility that uses these macros instead calls them with &lt;code&gt;addruart&lt;/code&gt;&#39;s
second argument.&lt;/p&gt;
&lt;p&gt;So what&amp;rsquo;s the UART&amp;rsquo;s physical base address, and what do the registers look
like? Only one way to find out: disassemble the vendor&amp;rsquo;s decompressor, and
see what addresses it uses.&lt;/p&gt;
&lt;p&gt;There are many disassemblers available, starting of course with GNU objdump.
This is as good an occasion as any to try out a new tool that came out:
&lt;a href=&#34;https://ghidra-sre.org/&#34;&gt;Ghidra&lt;/a&gt;. It&amp;rsquo;s a reverse engineering suite in the style
of the venerable &lt;a href=&#34;https://www.hex-rays.com/products/ida/index.shtml&#34;&gt;IDA Pro&lt;/a&gt;.
Unlike IDA, it&amp;rsquo;s open source &amp;ndash; and very unlike anything, it was written and is
maintained by the NSA. The code is &lt;a href=&#34;https://github.com/NationalSecurityAgency/ghidra&#34;&gt;right
here&lt;/a&gt; on GitHub. It&amp;rsquo;s a
little surreal to be using NSA code, but as it turns out the tool is very good.
It supports a ton of architectures, and has a great decompiler &amp;ndash; which, due to
Ghidra&amp;rsquo;s internal architecture, automatically works on all supported CPUs.&lt;/p&gt;
&lt;p&gt;The camera vendor&amp;rsquo;s version of Linux is a hacked-up 3.4.43. So compiling that
version from source and comparing the decompiled output with the source code
should go a long way to understanding the vendor&amp;rsquo;s code as well. The first
thing that gets sent to console is &amp;ldquo;&lt;code&gt;Uncompressing Linux...&lt;/code&gt;&amp;rdquo;. This is done in
&lt;code&gt;arch/arm/boot/compressed/misc.c&lt;/code&gt;, in function &lt;code&gt;decompress_kernel()&lt;/code&gt;. The
call is to &lt;code&gt;putstr()&lt;/code&gt;, a wrapper around the &lt;code&gt;putc()&lt;/code&gt; function we saw defined
earlier. Following that should get us the macros we&amp;rsquo;re after.&lt;/p&gt;
&lt;p&gt;Unfortunately the compiler inlined the putc function, so the loop has all the
low-level I/O smeared into it Here&amp;rsquo;s the source code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;static void putstr(const char *ptr)
{
    char c;

    while ((c = *ptr++) != &#39;\0&#39;) {
        if (c == &#39;\n&#39;)
            putc(&#39;\r&#39;);
        putc(c);
    }

    flush();
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is Ghidra&amp;rsquo;s decompiled version of the vendor code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void putstr(char *ptr)
{
    byte bVar1;
    uint uVar2;
    int *piVar3;
	
    piVar3 = *(int **)(PTR_DAT_c10008c0 + DAT_c10008c4 + -0x3efff7e4);
    while( true ) {
        bVar1 = *ptr;
        if (bVar1 == 0) break;
        if (bVar1 == 10) {
            do {
                uVar2 = (**(code **)(*piVar3 + 0x10))(DAT_c10008c8);
            } while ((uVar2 &amp;amp; 0x40) == 0);
            (**(code **)(*piVar3 + 0x1c))(0xd,DAT_c10008cc);
        }
        do {
            uVar2 = (**(code **)(*piVar3 + 0x10))(DAT_c10008c8);
        } while ((uVar2 &amp;amp; 0x40) == 0);
        (**(code **)(*piVar3 + 0x1c))((uint)bVar1,DAT_c10008cc);
        ptr = (char *)((byte *)ptr + 1);
    }
    while (uVar2 = (**(code **)(*piVar3 + 0x10))(DAT_c10008c8), (uVar2 &amp;amp; 4) != 0) {
        (**(code **)(*piVar3 + 0x10))(DAT_c10008cc);
    }
    return;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;DAT_&lt;/code&gt; variables are references to memory addresses containing values,
which the instructions dereference. The ARM architecture has fixed 32-bit
instructions, so cannot handle direct register loads of 32-bit addresses. So
we&amp;rsquo;d expect to find these UART base addresses in variables like this.&lt;/p&gt;
&lt;p&gt;That last while loop is the flush() function i.e. &lt;code&gt;busyuart&lt;/code&gt;. Cleaned up and
with the variables filled in, it looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;while (uVar2 = (somestruct + 0x10)(0xa0005014), (uVar2 &amp;amp; 4) != 0) {
    (somestruct + 0x10)(0xa0005004);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That looks like it reads from &lt;code&gt;0xa0005014&lt;/code&gt; and waits for bit 2 to go low,
reading from &lt;code&gt;0xa0005004&lt;/code&gt; while it&amp;rsquo;s not. In other words, that bit indicates
there&amp;rsquo;s stuff in the input buffer (it has one!), and drains it. So &lt;code&gt;0xa0005004&lt;/code&gt;
is the input buffer register.&lt;/p&gt;
&lt;p&gt;But why is this using a function? The struct is somewhere in memory, and
evidently the member at offset 0x10 is a pointer to a read function.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s only one line that sends &lt;code&gt;putstr()&lt;/code&gt;&#39;s string argument anywhere, so
that&amp;rsquo;s got to be the &lt;code&gt;senduart&lt;/code&gt; macro&amp;rsquo;s implementation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(**(code **)(*piVar3 + 0x1c))((uint)bVar1,DAT_c10008cc);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Cleaned up:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(somestruct + 0x1c)(bVar1, 0xa0005004);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So the mystery struct has a pointer to a write function at 0x1c, and the UART
output register is evidently &lt;code&gt;0xa0005004&lt;/code&gt; as well.&lt;/p&gt;
&lt;p&gt;Right before that is a busy loop, clearly the &lt;code&gt;waituart&lt;/code&gt; macro:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    do {
        uVar2 = (**(code **)(*piVar3 + 0x10))(DAT_c10008c8);
    } while ((uVar2 &amp;amp; 0x40) == 0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This reads &lt;code&gt;0xa0005014&lt;/code&gt; in a loop, waiting for bit 6 to go high.&lt;/p&gt;
&lt;p&gt;There is no implementation of &lt;code&gt;addruart&lt;/code&gt;, as all address are hardcoded in
memory variables. But we have the addresses, and some bitfields. That should
be enough to make a kernel that boots and tells us it&amp;rsquo;s decompressing itself.&lt;/p&gt;
&lt;p&gt;Progress!&lt;/p&gt;
</description>
    </item>
    <item>
      <title>This Little Camera</title>
      <link>https://biot.com/blog/posts/this-little-camera/</link>
      <pubDate>Fri, 08 Nov 2019 14:18:00 +0100</pubDate>
      <guid>https://biot.com/blog/posts/this-little-camera/</guid>
      <description>&lt;p&gt;Some time ago I came across a strange product on Banggood. It was a small camera, with a base and separate camera head on top. The camera head supports pan and tilt, like a bobblehead.&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&#34;https://biot.com/blog/blog/bobblehead.jpg&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;Digoo DG-M1Z&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;The specs aren&amp;rsquo;t half bad: supports 1080p, has wifi and ethernet on board, and has a microSD card slot. These are intended for home security. The idea is you set these down somewhere and it streams live video over wifi. Sounds great!&lt;/p&gt;
&lt;p&gt;The strange thing was the price: these cameras go for $15-$20 on Banggood and similar Chinese sites. Turns out that&amp;rsquo;s not the complete price however: you&amp;rsquo;re supposed to buy a monthly subscription for the &amp;ldquo;cloud service&amp;rdquo;. The camera uploads video to this cloud, and you can watch it there with your paid account.&lt;/p&gt;
&lt;p&gt;The cloud in question is some Chinese service, and of course the camera cannot work without it: you can&amp;rsquo;t just stream your own video off of your own camera. So it needs a working internet connection as well.&lt;/p&gt;
&lt;p&gt;So for $15 + a monthly fee you buy a spyware device that uploads your house&amp;rsquo;s video to China. You&amp;rsquo;d have to be insane to think that&amp;rsquo;s a good deal; I think it&amp;rsquo;s downright offensive.&lt;/p&gt;
&lt;p&gt;It occurred to me that these things would be awesome if you could only put your own software on them, so it wouldn&amp;rsquo;t connect to anywhere &amp;ndash; just store or stream video locally. That really would get you a $15 camera without the Chinese mass surveillance feature.&lt;/p&gt;
&lt;p&gt;I decided to look a little closer at the hardware, and see how doable this would be. As expected at this price, the system is mostly a single chip &amp;ndash; the GK7101 &amp;ndash; with only the barest essentials to support it. This very capable chip is an SoC made by &lt;a href=&#34;http://www.goke.com/en/&#34;&gt;Goke Micro&lt;/a&gt;, a Chinese company. The chip has an older &lt;a href=&#34;https://en.wikipedia.org/wiki/ARM11&#34;&gt;ARM 1176&lt;/a&gt; CPU core and a ton of peripherals:&lt;/p&gt;
&lt;!-- raw HTML omitted --&gt;
&lt;p&gt;I set about finding a serial port of sorts on the board. There&amp;rsquo;s usually one there for debugging, but sometimes it take a little effort to find it. No problem this time though:&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&#34;https://biot.com/blog/blog/gk7101-txrx.jpg&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;X marks the spot&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;As usual for hardware of this caliber, it runs Linux. Also as usual, none of the changes made to the Linux kernel for this hardware have been upstreamed. There are a ton of kernel modules, all of which are marked as GPL. The kernel itself is an old 3.4.43 version, also with many changes.&lt;/p&gt;
&lt;p&gt;It seems to me I&amp;rsquo;d need to, in order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;get a mainline kernel running on the board&lt;/li&gt;
&lt;li&gt;port the modules one by one into new drivers, reverse engineering their functionality as I go&lt;/li&gt;
&lt;li&gt;get a basic root filesystem going and look at implementing a streaming API, and whatever else seems handy&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course the sharp-eyed reader will have noticed that, in fact, points 1 and 2 (porting the kernel) are actually optional. After all, the userspace software is the bit that locks you in to this Chinese cloud stuff; if that software can use the video coming out of that custom kernel, I can write software to do the same.&lt;/p&gt;
&lt;p&gt;But you know what? Porting the kernel is, to me, by far the most interesting part of this. It&amp;rsquo;s a pretty big project, since the SoC has so much functionality on board.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also really hard, because there is no datasheet available for the chip. I&amp;rsquo;ve tried contacting various people at Goke, but the only person that replied was a sales guy &amp;ndash; who stopped replying the moment he realized I wasn&amp;rsquo;t going to buy 10k units.
Needless to say requests for their GPL&amp;rsquo;ed code didn&amp;rsquo;t get anywhere.&lt;/p&gt;
&lt;p&gt;There is an SDK they give to customers, but that is also not publicly available.&lt;/p&gt;
&lt;p&gt;What a challenge this is. Let&amp;rsquo;s go!&lt;/p&gt;
</description>
    </item>
  </channel>
</rss>
