(placeholder)

JayStation2 Dev Blog

3

Chapter 4: Everybody Dies

I really enjoy Dr. House. So much so that I often name my booleans lupus when I want to assert something is never true. ASSERT_MSG( lupus == false, “it’s never lupus” );


Disclaimer: I do not work for Sony. Despite the disturbing percentage of my shirts, jackets, and bookbags that are PlayStation dev-related, I have never worked for Sony. I do, however, have many friends that work at Sony, some of which I hope will call off the corporate lawyers. JayStation is in no way associated with Sony or PlayStation, and any stupid things I say represent only my own ineptitude and silliness.


It was the great disaster of 2015. I was looking forward to finally getting to some actual OS-related dev this week. I was so excited that I woke up at 6 and rode to work at 7AM just so I could get two hours of JayStation2 dev in before work started. I made a few changes, built, and popped in my MicroSD card writer so I could copy the image. Nothing. I pulled it out and reinserted it. Nothing. Tried a different USB port and even a different computer. It was hopeless. It turns out to be the case that parts wear out, pieces break, and everything dies eventually.  Even the Raspberry Pi MicroSD card slot is only supposed to last 10,000 insert/remove cycles.


This changed everything. I was inserting and removing my MicroSD card more than 30 times a day. At that rate I’d be screwed in under a year. I needed a way to upload and run my OS kernels without the endless card inserts and removals. And thats what this week’s entry is all about.


Starting out with a high level view, what we want is to boot a small loader that will spin waiting for the host to upload some code to run on the target. The target then copies the code to the proper memory address, and jumps to it to start running the OS. Easy.


Here are the actual steps, in practice:


Host side:

0) modify makefile to output the needed extra info in kernel7.list

1) as usual, build the OS kernel image you want to run

2) run some script on the image to turn it into an update file

3) connect to the Raspberry Pi via some serial terminal

4) send the update file and await victory


Target side:

0) build the loader, copy it to the SD card, and run it

1) the loader then copies itself out of the way where it can’t interfere with the OS

2) wait for update file bytes to start coming in over the UART

3) copy incoming sections to the proper memory addresses

4) jump to the start address and execute the code


For host side step 0, what we really want to output is program headers, information about section size/load addresses, and where in the original img file the section data lives. Fortunately all of this can be easily achieved with the following makefile target


$(LIST) : $(BUILD)output.elf

$(OBJDMP) --file-offsets --all-headers --reloc --file-headers -s -D $(BUILD)output.elf > $(LIST)


start address 0x00008004


Program Header:

    LOAD off    0x00008000 vaddr 0x00000000 paddr 0x00000000 align 2**15

         filesz 0x000082ec memsz 0x000082ec flags r-x

private flags = 5000002: [Version5 EABI] [has entry point]


Sections:

Idx Name          Size      VMA       LMA       File off  Algn

  0 .text.ivt     00000070  00000000  00000000  00008000  2**2

                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  1 .init         00000004  00008000  00008000  00010000  2**2

                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  2 .text         000002e8  00008004  00008004  00010004  2**2

                  CONTENTS, ALLOC, LOAD, READONLY, CODE


ldr r0, =loader_code_old_begin

ldr r1, =loader_code_old_end

ldr r2, =loader_code_new_begin

copy_loader_loop:

ldr r3, [r0], #4

str r3, [r2], #4

cmp r0, r1

blt copy_loader_loop


// jump to the the loader

ldr r0, =loader_code_new_begin

bx r0

////////////////////////////////////////////////////////////////

loader_code_old_begin:

// WARNING: nothing in here can use bl.

// remember, this code isn't executed here. We are copying it a

// gibibyte away where all your branch links will fail miserably.

mov32 r0, waiting_msg

mov32 r1, js2osDebugPrintString

blx r1

loader_code_old_end:


   .equ JS2U, 0x5532534a

     .equ PDAT, 0x54414450

     .equ SECN, 0x4e434553

     .equ UPDA, 0x41445055

     .equ TEND, 0x444e4554


     sec_count .req r4

     load_addr .req r5

     section_size .req r6

     calculated_checksum .req r7

     progress_amt .req r8


     // step 1: look for file header JS2UPDAT

     bl loader_read_word

     mov32 r1, JS2U

     cmp r0, r1

     bne loader_error_magic1

     bl loader_read_word

     mov32 r1, PDAT

     cmp r0, r1

     bne loader_error_magic2

     

     // step 2: read num sections

     bl loader_read_word

     cmp r0, #0

     beq loader_error_no_sections

     mov sec_count, r0


     // step 3: read sections

     loader_read_sections:

          // SECN section header

          bl loader_read_word

          mov32 r1, SECN

          cmp r0, r1

          bne loader_error_no_secn_header


          mov32 r0, loader_new_sec_msg

          bl js2osDebugPrintStringEmbedded


          // load address. todo: we can at least check 4 byte alignment?

          bl loader_read_word

          mov load_addr, r0


          mov r1, r0

          mov32 r0, loader_new_sec_addr_msg

          bl js2osDebugPrintMsgAndValueEmbedded


          // bytesize

          bl loader_read_word

          cmp r0, #0

          beq loader_error_zero_size_section

          mov section_size, r0


          mov r1, r0

          mov32 r0, loader_new_sec_size_msg

          bl js2osDebugPrintMsgAndValueEmbedded


          // data and calculate checksum

          mov calculated_checksum, #0

          mov progress_amt, #0


          mov32 r0, just_newline

          bl js2osDebugPrintStringEmbedded


          loader_read_section_data:

               bl loader_read_word


               eor calculated_checksum, calculated_checksum, r0

               str r0, [load_addr], #4


               // print out num bytes every 256 bytes read

               ands r0, progress_amt, #0xFF

               mov r1, progress_amt

               mov32 r0, loader_progress_msg

               bleq js2osDebugPrintMsgAndValueEmbedded

               add progress_amt, progress_amt, #4


               sub section_size, section_size, #4

               cmp section_size, #0

               bgt loader_read_section_data


          mov r1, progress_amt

          mov32 r0, loader_progress_msg

          bl js2osDebugPrintMsgAndValueEmbedded


          // check checksum against expected

          bl loader_read_word

          cmp r0, calculated_checksum

          movne r0, calculated_checksum

          bne loader_error_bad_checksum


          mov r1, r0

          mov32 r0, loader_checksum_match_msg

          bl js2osDebugPrintMsgAndValueEmbedded


          // next section

          sub sec_count, sec_count, #1

          cmp sec_count, #0

          bgt loader_read_sections


     // step 4: read where to branch

     bl loader_read_word

     mov r4, r0


     // step 5: read end of file magic

     bl loader_read_word

     mov32 r1, UPDA

     cmp r0, r1

     bne loader_error_magic3

     bl loader_read_word

     mov32 r1, TEND

     cmp r0, r1

     bne loader_error_magic3


     // step 6: go go go!

     // change the vector address back

     mov r2, #0

     mcr p15, 0, r2, c12, c0, 0


     mov r1, r4

     mov32 r0, make_you_jump_jump

     bl js2osDebugPrintMsgAndValueEmbedded

     bx r4


     .unreq sec_count

     .unreq load_addr

     .unreq section_size

     .unreq calculated_checksum

     .unreq progress_amt


.globl js2osWatchdogTimerStart

js2osWatchdogTimerStart:

     // clamp input

     ldr r1, =0xFFFFF;

     and r0, r0, r1


     // read old PM base

   ldr r1, =PM_BASE

     ldr r2, [r1, #PM_RSTC]


     // write the PM_WDOG reg. timer clock / 16; need password (31:16) + value (11:0)

     orr r0, r0, #0x5a000000

     str r0, [r1, #PM_WDOG]


     // write the PM_RSTC reg. PM_PASSWORD | (pm_rstc & PM_RSTC_WRCFG_CLR) | PM_RSTC_WRCFG_FULL_RESET

     and r2, r2, #0xffffffcf // and original value with PM_RSTC_WRCFG_CLR

     orr r2, r2, #0x5a000000 // or with PM_PASSWORD

     orr r2, r2, #0x00000020 // or with PM_RSTC_WRCFG_FULL_RESET

     str r2, [r1, #PM_RSTC]


     mov pc, lr


.globl js2osWatchdogTimerStop

js2osWatchdogTimerStop:

     ldr r1, =PM_BASE

     // just write ( PM_PASSWORD | PM_RSTC_RESET ) to PM_RSTC

     mov r0, #0x5a000000          // PM_PASSWORD     

     orr r0, r0, #0x00000002     // PM_RSTC_RESET lower byte

     orr r0, r0, #0x00000100     // PM_RSTC_RESET upper byte

     str r0, [r1, #PM_RSTC]

     mov pc, lr


.globl js2osWatchdogTimerGetRemaining

js2osWatchdogTimerGetRemaining:

     ldr r1, =PM_BASE

     ldr r0, [r1, #PM_WDOG]

     ldr r1, =0xFFFFF;

     and r0, r0, r1

     mov pc, lr

which produces some very useful looking output, including

And we want to take that information and turn it into a file that looks like this


    file format is:

        JS2UPDAT (8 bytes)

        num sections (4 bytes)

            SECN (4 bytes)

            load addr (4 bytes)

            bytesize (4 bytes)

            section data (N bytes)

            section checksum (4 bytes)

        execution start address (4 bytes)

        UPDATEND (8 bytes)


Worth noting is that if you have very sparse sections, this will pack them consecutively to make the update files smaller. Creating these files is fairly straightforward, but in case you want to save yourself the time, there is a PERL script (here) for converting img files into JUP files. Why PERL, a language that is almost universally disliked as far as I can tell? I am no fan of scripting languages in general, but in this case I wanted something that would just run as-is on all platforms without special casing my makefile, so writing the converter in C wouldn’t really work. I guess I could have used LISP or Haskell and actually enjoyed the process, but all systems I know of come with PERL pre-installed and its always nice to save someone the frustration of having to install another dependency.


So now you can build your update files and (hopefully) send files to your Raspberry Pi.  Lets take a look at the target side, starting with step 1. We need a way to have the loader copy itself far far away where it will never interfere with the OS.  The general idea is to do something like this:

Because we put the loader code between the labels loader_code_old_begin and loader_code_old_end, we know exactly what we need to copy. That first part just loads in the source and destination addresses, and loops copying the loader code to its new home. We then jump to it using bx. Why bx? Because we are going to jump 950MB away, and the ARM branch instructions that take immediate jump destinations only have a finite number of bits in a 32 bit instruction word that can be used for immediate address. The non-conditional ones have a range of +/- 16MB, and they tend to be PC-relative resulting in position independent code, a property that is going to be not always what we want in the loader.


Within the loader itself, all loads, jumps, and branches to far away places must not use any PC-relative addressing. Imagine the case of bl func_name. Because the assembler only knows where the loader code is at assembly time and not where we plan to copy it to, this assembles without errors. However, using bl to jump to non-loader-copied functions means we’ll be jumping to garbage memory some offset from the new loader location, and trying to execute it. Also worth looking out for is constant loads of the form ldr r0, =my_label. This tends to create a literal pool near the ldr site, which is fine, however if you don’t copy that literal pool with your loader code you’ll be loading the constant from garbage memory.


Possibly worse, is any code left at the original location may be overwritten by the kernel you are loading into memory. Depending on the side of the kernel and the addresses used, this may work fine sometimes but then cause craziness when the loaded image changes size by a few bytes. Really the best thing to do is copy all functions the loader needs into the loader itself


Here is an example just to solidify. Notice everything is PC-relative except the mov32 macro.

Now we know how the loader can copy itself, and we know what kinds of things are allowed in there, we can talk about the code to do the loading. I almost hesitate to paste it in. Its just a regular state machine and not very interesting, but here is it for convenience

And thats how you never have to remove/insert your MicroSD card again. There is one last loose end to tie up, and its related to a very useful bit of undocumented silicon called the Watchdog Timer. What if your OS hangs? To reboot your Pi, the three main options are to pull out the USB cable from your computer and reinsert it, pull out the micro USB end from your Pi and reinsert it, or wire up a reset button. Orrr... you could just use the watchdog timer. Its basically a timer that counts down towards zero, and if it ever reaches zero, reboots the Pi. Its kinda like that reddit counter reset button, but useful. If your program should hang, the counter continues to zero and the Pi is rebooted. There is zero documentation on it, but here is how to use it

I got alot of this from the Linux kernel. Seems the guy who wrote the driver works for Broadcom, so of course he has access to the docs that aren’t released to the rest of us. As far as I can tell, its a password register with an 8-bit password. The password goes in the upper byte of both PM_RSTC and PM_WDOG. The WR config seem to go in bits 4 and 5 of PM_RSTC which is why we clear them with 0xFFFFFFCF and set it with 0x30. I’m not sure what the specific bit layout of PM_RSTC is but it seems like you can reset the timer by writing 0b100000010, and writing 0b100000 specifies you want a full reset on timer expiration. Finally, converting from ticks to seconds is just a matter of dividing by 216.


Edit 2015/5/4: One thing I forgot to mention. My loader has its own ISV table which I use while copying the user ISV table to address zero. Once everything is copied, I then tell the ARM code to use the table at address zero. I’ll talk about this when I talk about interrupts