3
Chapter 4: Everybody Dies
I really enjoy Dr. House. So much so that I often name my booleans lupus when I want to assert something is never true. ASSERT_MSG( lupus == false, “it’s never lupus” );
Disclaimer: I do not work for Sony. Despite the disturbing percentage of my shirts, jackets, and bookbags that are PlayStation dev-related, I have never worked for Sony. I do, however, have many friends that work at Sony, some of which I hope will call off the corporate lawyers. JayStation is in no way associated with Sony or PlayStation, and any stupid things I say represent only my own ineptitude and silliness.
It was the great disaster of 2015. I was looking forward to finally getting to some actual OS-related dev this week. I was so excited that I woke up at 6 and rode to work at 7AM just so I could get two hours of JayStation2 dev in before work started. I made a few changes, built, and popped in my MicroSD card writer so I could copy the image. Nothing. I pulled it out and reinserted it. Nothing. Tried a different USB port and even a different computer. It was hopeless. It turns out to be the case that parts wear out, pieces break, and everything dies eventually. Even the Raspberry Pi MicroSD card slot is only supposed to last 10,000 insert/remove cycles.
This changed everything. I was inserting and removing my MicroSD card more than 30 times a day. At that rate I’d be screwed in under a year. I needed a way to upload and run my OS kernels without the endless card inserts and removals. And thats what this week’s entry is all about.
Starting out with a high level view, what we want is to boot a small loader that will spin waiting for the host to upload some code to run on the target. The target then copies the code to the proper memory address, and jumps to it to start running the OS. Easy.
Here are the actual steps, in practice:
Host side:
0) modify makefile to output the needed extra info in kernel7.list
1) as usual, build the OS kernel image you want to run
2) run some script on the image to turn it into an update file
3) connect to the Raspberry Pi via some serial terminal
4) send the update file and await victory
Target side:
0) build the loader, copy it to the SD card, and run it
1) the loader then copies itself out of the way where it can’t interfere with the OS
2) wait for update file bytes to start coming in over the UART
3) copy incoming sections to the proper memory addresses
4) jump to the start address and execute the code
For host side step 0, what we really want to output is program headers, information about section size/load addresses, and where in the original img file the section data lives. Fortunately all of this can be easily achieved with the following makefile target
$(LIST) : $(BUILD)output.elf
$(OBJDMP) --file-offsets --all-headers --reloc --file-headers -s -D $(BUILD)output.elf > $(LIST)
start address 0x00008004
Program Header:
LOAD off 0x00008000 vaddr 0x00000000 paddr 0x00000000 align 2**15
filesz 0x000082ec memsz 0x000082ec flags r-x
private flags = 5000002: [Version5 EABI] [has entry point]
Sections:
Idx Name Size VMA LMA File off Algn
0 .text.ivt 00000070 00000000 00000000 00008000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .init 00000004 00008000 00008000 00010000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .text 000002e8 00008004 00008004 00010004 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
ldr r0, =loader_code_old_begin
ldr r1, =loader_code_old_end
ldr r2, =loader_code_new_begin
copy_loader_loop:
ldr r3, [r0], #4
str r3, [r2], #4
cmp r0, r1
blt copy_loader_loop
// jump to the the loader
ldr r0, =loader_code_new_begin
bx r0
////////////////////////////////////////////////////////////////
loader_code_old_begin:
// WARNING: nothing in here can use bl.
// remember, this code isn't executed here. We are copying it a
// gibibyte away where all your branch links will fail miserably.
mov32 r0, waiting_msg
mov32 r1, js2osDebugPrintString
blx r1
loader_code_old_end:
.equ JS2U, 0x5532534a
.equ PDAT, 0x54414450
.equ SECN, 0x4e434553
.equ UPDA, 0x41445055
.equ TEND, 0x444e4554
sec_count .req r4
load_addr .req r5
section_size .req r6
calculated_checksum .req r7
progress_amt .req r8
// step 1: look for file header JS2UPDAT
bl loader_read_word
mov32 r1, JS2U
cmp r0, r1
bne loader_error_magic1
bl loader_read_word
mov32 r1, PDAT
cmp r0, r1
bne loader_error_magic2
// step 2: read num sections
bl loader_read_word
cmp r0, #0
beq loader_error_no_sections
mov sec_count, r0
// step 3: read sections
loader_read_sections:
// SECN section header
bl loader_read_word
mov32 r1, SECN
cmp r0, r1
bne loader_error_no_secn_header
mov32 r0, loader_new_sec_msg
bl js2osDebugPrintStringEmbedded
// load address. todo: we can at least check 4 byte alignment?
bl loader_read_word
mov load_addr, r0
mov r1, r0
mov32 r0, loader_new_sec_addr_msg
bl js2osDebugPrintMsgAndValueEmbedded
// bytesize
bl loader_read_word
cmp r0, #0
beq loader_error_zero_size_section
mov section_size, r0
mov r1, r0
mov32 r0, loader_new_sec_size_msg
bl js2osDebugPrintMsgAndValueEmbedded
// data and calculate checksum
mov calculated_checksum, #0
mov progress_amt, #0
mov32 r0, just_newline
bl js2osDebugPrintStringEmbedded
loader_read_section_data:
bl loader_read_word
eor calculated_checksum, calculated_checksum, r0
str r0, [load_addr], #4
// print out num bytes every 256 bytes read
ands r0, progress_amt, #0xFF
mov r1, progress_amt
mov32 r0, loader_progress_msg
bleq js2osDebugPrintMsgAndValueEmbedded
add progress_amt, progress_amt, #4
sub section_size, section_size, #4
cmp section_size, #0
bgt loader_read_section_data
mov r1, progress_amt
mov32 r0, loader_progress_msg
bl js2osDebugPrintMsgAndValueEmbedded
// check checksum against expected
bl loader_read_word
cmp r0, calculated_checksum
movne r0, calculated_checksum
bne loader_error_bad_checksum
mov r1, r0
mov32 r0, loader_checksum_match_msg
bl js2osDebugPrintMsgAndValueEmbedded
// next section
sub sec_count, sec_count, #1
cmp sec_count, #0
bgt loader_read_sections
// step 4: read where to branch
bl loader_read_word
mov r4, r0
// step 5: read end of file magic
bl loader_read_word
mov32 r1, UPDA
cmp r0, r1
bne loader_error_magic3
bl loader_read_word
mov32 r1, TEND
cmp r0, r1
bne loader_error_magic3
// step 6: go go go!
// change the vector address back
mov r2, #0
mcr p15, 0, r2, c12, c0, 0
mov r1, r4
mov32 r0, make_you_jump_jump
bl js2osDebugPrintMsgAndValueEmbedded
bx r4
.unreq sec_count
.unreq load_addr
.unreq section_size
.unreq calculated_checksum
.unreq progress_amt
.globl js2osWatchdogTimerStart
js2osWatchdogTimerStart:
// clamp input
ldr r1, =0xFFFFF;
and r0, r0, r1
// read old PM base
ldr r1, =PM_BASE
ldr r2, [r1, #PM_RSTC]
// write the PM_WDOG reg. timer clock / 16; need password (31:16) + value (11:0)
orr r0, r0, #0x5a000000
str r0, [r1, #PM_WDOG]
// write the PM_RSTC reg. PM_PASSWORD | (pm_rstc & PM_RSTC_WRCFG_CLR) | PM_RSTC_WRCFG_FULL_RESET
and r2, r2, #0xffffffcf // and original value with PM_RSTC_WRCFG_CLR
orr r2, r2, #0x5a000000 // or with PM_PASSWORD
orr r2, r2, #0x00000020 // or with PM_RSTC_WRCFG_FULL_RESET
str r2, [r1, #PM_RSTC]
mov pc, lr
.globl js2osWatchdogTimerStop
js2osWatchdogTimerStop:
ldr r1, =PM_BASE
// just write ( PM_PASSWORD | PM_RSTC_RESET ) to PM_RSTC
mov r0, #0x5a000000 // PM_PASSWORD
orr r0, r0, #0x00000002 // PM_RSTC_RESET lower byte
orr r0, r0, #0x00000100 // PM_RSTC_RESET upper byte
str r0, [r1, #PM_RSTC]
mov pc, lr
.globl js2osWatchdogTimerGetRemaining
js2osWatchdogTimerGetRemaining:
ldr r1, =PM_BASE
ldr r0, [r1, #PM_WDOG]
ldr r1, =0xFFFFF;
and r0, r0, r1
mov pc, lr
which produces some very useful looking output, including
And we want to take that information and turn it into a file that looks like this
file format is:
JS2UPDAT (8 bytes)
num sections (4 bytes)
SECN (4 bytes)
load addr (4 bytes)
bytesize (4 bytes)
section data (N bytes)
section checksum (4 bytes)
execution start address (4 bytes)
UPDATEND (8 bytes)
Worth noting is that if you have very sparse sections, this will pack them consecutively to make the update files smaller. Creating these files is fairly straightforward, but in case you want to save yourself the time, there is a PERL script (here) for converting img files into JUP files. Why PERL, a language that is almost universally disliked as far as I can tell? I am no fan of scripting languages in general, but in this case I wanted something that would just run as-is on all platforms without special casing my makefile, so writing the converter in C wouldn’t really work. I guess I could have used LISP or Haskell and actually enjoyed the process, but all systems I know of come with PERL pre-installed and its always nice to save someone the frustration of having to install another dependency.
So now you can build your update files and (hopefully) send files to your Raspberry Pi. Lets take a look at the target side, starting with step 1. We need a way to have the loader copy itself far far away where it will never interfere with the OS. The general idea is to do something like this:
Because we put the loader code between the labels loader_code_old_begin and loader_code_old_end, we know exactly what we need to copy. That first part just loads in the source and destination addresses, and loops copying the loader code to its new home. We then jump to it using bx. Why bx? Because we are going to jump 950MB away, and the ARM branch instructions that take immediate jump destinations only have a finite number of bits in a 32 bit instruction word that can be used for immediate address. The non-conditional ones have a range of +/- 16MB, and they tend to be PC-relative resulting in position independent code, a property that is going to be not always what we want in the loader.
Within the loader itself, all loads, jumps, and branches to far away places must not use any PC-relative addressing. Imagine the case of bl func_name. Because the assembler only knows where the loader code is at assembly time and not where we plan to copy it to, this assembles without errors. However, using bl to jump to non-loader-copied functions means we’ll be jumping to garbage memory some offset from the new loader location, and trying to execute it. Also worth looking out for is constant loads of the form ldr r0, =my_label. This tends to create a literal pool near the ldr site, which is fine, however if you don’t copy that literal pool with your loader code you’ll be loading the constant from garbage memory.
Possibly worse, is any code left at the original location may be overwritten by the kernel you are loading into memory. Depending on the side of the kernel and the addresses used, this may work fine sometimes but then cause craziness when the loaded image changes size by a few bytes. Really the best thing to do is copy all functions the loader needs into the loader itself
Here is an example just to solidify. Notice everything is PC-relative except the mov32 macro.
Now we know how the loader can copy itself, and we know what kinds of things are allowed in there, we can talk about the code to do the loading. I almost hesitate to paste it in. Its just a regular state machine and not very interesting, but here is it for convenience
And thats how you never have to remove/insert your MicroSD card again. There is one last loose end to tie up, and its related to a very useful bit of undocumented silicon called the Watchdog Timer. What if your OS hangs? To reboot your Pi, the three main options are to pull out the USB cable from your computer and reinsert it, pull out the micro USB end from your Pi and reinsert it, or wire up a reset button. Orrr... you could just use the watchdog timer. Its basically a timer that counts down towards zero, and if it ever reaches zero, reboots the Pi. Its kinda like that reddit counter reset button, but useful. If your program should hang, the counter continues to zero and the Pi is rebooted. There is zero documentation on it, but here is how to use it
I got alot of this from the Linux kernel. Seems the guy who wrote the driver works for Broadcom, so of course he has access to the docs that aren’t released to the rest of us. As far as I can tell, its a password register with an 8-bit password. The password goes in the upper byte of both PM_RSTC and PM_WDOG. The WR config seem to go in bits 4 and 5 of PM_RSTC which is why we clear them with 0xFFFFFFCF and set it with 0x30. I’m not sure what the specific bit layout of PM_RSTC is but it seems like you can reset the timer by writing 0b100000010, and writing 0b100000 specifies you want a full reset on timer expiration. Finally, converting from ticks to seconds is just a matter of dividing by 216.
Edit 2015/5/4: One thing I forgot to mention. My loader has its own ISV table which I use while copying the user ISV table to address zero. Once everything is copied, I then tell the ARM code to use the table at address zero. I’ll talk about this when I talk about interrupts