Chapter 5: And The Cow Says MMU

This is a cow. He doesn’t actually say MMU, but he is passionate about memory protection. If you’re looking for a common theme between blog entries, both this cow and Morrissey (from chapter 2) are vegetarian

Disclaimer: I do not work for Sony. Despite the disturbing percentage of my shirts, jackets, and bookbags that are PlayStation dev-related, I have never worked for Sony. I do, however, have many friends that work at Sony, some of which I hope will call off the corporate lawyers. JayStation is in no way associated with Sony or PlayStation, and any stupid things I say represent only my own ineptitude and silliness.

Don’t you hate it when you accidentally write to a dereferenced NULL pointer overwriting your OS’s ISV table, bringing the whole system crashing to a firey cataclysm? Yeah, well if that doesn’t happen much anymore, you can thank memory protection. I know how kids these days with their buzzfeeds and clickbait can only process information in short list form, so to avoid long walls of text, I’ll assume you already know a bit about how memory protection works and will keep these blog entries short.

There are two things we’ll want to use the MMU for. First, we are interested in mapping virtual addresses to physical ones. We'll start out with a 1:1 mapping without all that voodoo that allows you more virtual address space than physical memory (great for making streaming systems), and we definitely want user apps and the OS not trampling on each other’s memory. Also since peripherals are memory mapped, we could maybe even use this to limit who can see what peripherals.

Second is memory protection. we’ll eventually have OS threads and those threads will have stacks. We could imagine putting a fault-inducing page between stacks so that if we use even a single byte more stack than we have, we get a data abort.

We start with a table in memory, 16 kibibyte aligned with 4096 32-bit entries. We have 4096 entries because each entry will represent address translation and memory protection information for a single one mebibyte page. With 4 gibibytes to represent, 4,294,967,296 / 1,048,576 = 4096. Each entry looks like this

// Bits 31:20 - Section base address

// Bits 11:10 - Access permissions, 00=fault, 01=client, 11=manager

// Bits 8:5 - Domain num, [0..15]

// Bit 4 - should be 1 for back compatibility

// Bits 3:2 - C bit (cacheable) and B bit (bufferable)

// Bits 1:0 - Always 0b10 for a section page table entry / descriptor

// note: C bit only affects whether or not the cache is written to.

// Cache is always searched on reads.

// always valid example 0x???00c12 = 0b????????????00000000110000010010

// always fault example 0x???00012 = 0b????????????00000000000000010010

For our example, we’ll set access permissions to 0b11 meaning we never generate a fault. Obviously we don’t really want that for every page but this is just an example. Bits 5 through 8 allow us to tag a page with a number between 0 and 15, and we can later control access based on these domain tags. The really important bit is those 12 bits that store the section base address, and thats how the system is going to match virtual address with physical address.

When we do a load from or store to an address, the system goes through our table entries one by one looking for an address match. The value in section base address is a multiple of 1 mebibyte, such that 0 means 0, 1 means 0x100000, 2 means 0x200000, 3 means 0x300000, and so on. If we tried to load from address 0x12345678, it would be somewhere in the page starting at 0x12300000, so we’d walk the table looking for a section base address of 0x123. Because our mapping is one-to-one, it happens to be the 0x123th entry.

// step 1: tell the CPU where the L1 OS pagetable lives, translation table base 0 register

mov32 r1, OS_PAGETABLE_L1_ADDR

mcr p15, 0, r1, c2, c0, 0

// always valid example 0x???00c12 = 0b????????????00000000110000010010

// always fault example 0x???00012 = 0b????????????00000000000000010010

mov r0, #0

movw r2, #0xc12

init_next_table_entry:

// or in the address bits and write it out

orr r3, r2, r0, lsl #20

str r3, [r1], #4

add r0, r0, #1

cmp r0, #4096

blt init_next_table_entry

// step 3: set permissions for some of the domains. Great for fast context switching

// write 0b11 (manager mode, access not checked) to C3, the Domain Access Control Register

mov r0, #0x3

mcr p15, 0, r0, c3, c0, 0

// step 4: turn on the MMU by setting the LSB in the control register

mrc p15, 0, r0, c1, c0, 0

orr r0, r0, #0x1

mcr p15, 0, r0, c1, c0, 0

Not too much to comment on. The first part is a coprocessor write to let the system know where we are setting up our L1 page table. Then we just loop 4096 times and fill in the entries. We use the same value each time, but change the section base address. Note this is a one-to-one mapping. Entry 0 has a section base address of 0, entry 1 has 1, entry 2 has 2, and so on. Finally we set domain permissions and turn the MMU on.

And thats all there is. Sort of. I didn’t talk about how JayStation2 is using the MMU, but more importantly I didn’t talk about page sizes. The page table I showed you how to set up is called the L1 page table. However ARM allows any entry in the L1 page table to act as a pointer to a secondary L2 table where that 1 mebibyte range can be further subdivided into pages of different sizes. Mixing and matching 1MiB, 64 KiB, 4KiB, and 1KiB pages is supported, with various tradeoffs. For example, larger page sizes can mean fewer TLB misses (a cache used to speed up address translation by avoiding table walks). However, if you want to memory protect the spaces between stacks and the page size is 1MiB, we have to waste 1MiB of precious memory which really adds up when you have many threads.

All that will be covered in part 2, as well as mapping multiple virtual ranges to the same physical page (magic ring buffer trick), cache coherency when we go multicore, and fun tricks to memset memory.