---------------------------------------------------------------------- Operating Systems Lecture Notes Lecture 11 MIPS TLB Structure Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. * Case Study: A simple VM and paging system for the MIPS R3000. * Start with architecture. Here is the memory hierarchy of the machine at hand: * First Level Cache. 64K bytes, direct mapped. Write allocate, write through. Physically addressed. * Write Buffer. 4 Entries. * Second Level Cache. 256K bytes, direct mapped. Write back. * Physical Memory. 64M bytes. 4K page frames. * Backing Store Disk. 256M bytes of swap space. * Address format. Are two modes: user mode and kernel mode. Top 20 bits are Virtual Page Number, bottom 12 bits are page offset. Machine has a 6 bit current process id; process id is part of 38 bit virtual address. All user mode addresses have a top address bit of 0. How big is potential user address space? How big are pages? * Kernel mode addresses. * If address starts with 0, is mapped just like current user process. So, user process address space is subset of OS address space. Called kuseg. * If address starts with 100, translates to bottom 512 Mbytes of physical memory and does not go through TLB. (cached and unmapped). Called kseg0. Used for kernel instructions and data. * If starts with 101, translates to bottom 512 Mbytes of physical memory. Is not cached (uncached, unmapped). Called kseg1. Used for disk buffers, I/O registers, ROM code. * If starts with 11, is mapped and cacheable. Maps differently for each process. Called kseg2. Used for kernel data structures in which there is one per address space - user page tables, etc. * Specification of mapping process: must map 31 bit virtual address (top bit is always 0) plus 6 bit process id to 32 bit physical address. Do mapping by first mapping upper 19 bits of virtual address plus 6 bit process id to a physical page frame, then using lower 12 bits of virtual address as offset within the physical page frame. * How do we map upper 19 bits of virtual address? Use a linear page table stored in kseg2. How big can this page table be? * Note that we will also be paging kseg2 using a linear page table. Where do we hold the page table for kseg2? There is one for each process, and it is stored in kseg0. If the page table for the user process stored in kseg2 takes up most of the used address space, how big can the page table for kseg2 stored in kseg0 be? * In effect, we have a two level approach. Given a 32 bit virtual address from kuseg, we get the physical address as follows: * Extract top 9 bits of address. Use this as an index into that process's kseg2 page table stored in kseg0. The memory in kseg0 is always there and the reference goes unmapped, so there will be no problem will this lookup. We get the physical page frame that holds the relevant part of the page table in kseg2. If the page is not resident, read it back in from disk. * Extract middle 10 bits of address. This is the amount you need to index one page of 4 byte physical addresses. Use the middle 10 bits to index the page table in kseg2. This lookup yields a physical page frame in kuseg. If the page is not resident, read it back in from disk. Must make sure that we are accessing the correct memory for the current process id since kseg2 maps differently for each process id. * Extract lower 12 bits of address. Use this as an offset into the physical page frame holding the page from kuseg. Read the memory location. * Why divide the lookup into two stages? So we can page the virtual address space AND page the page table. The page table for virtual address space is stored in the part of kernel address space that is mapped differently for different processes. The page table page table is stored in unmapped but cached kernel memory. * Seems inefficient - 3 memory accesses for one user memory access. So, speed it up with a TLB. 64 entry fully associative TLB. Each entry maps one virtual address to one physical page frame. * TLB entry format. Each TLB entry is 64 bits long. * Top 20 bits: VPN. * Next 6 bits: PID. * Next 6 bits: unused. * Next 20 bits: Physical page frame. * Next bit: N bit. If set, memory access bypasses the cache. If not set, memory access goes through the cache. * Next bit: D bit. If set, memory is writeable. If not set, memory is not writeable. * Next bit: V bit. If set, entry is valid. * Next bit: G bit. If set, TLB does not check PID for translation. * How does lookup work? Basic idea: match on upper half of TLB entry, use lower half of TLB entry. Can generate three different kinds of TLB misses, each with its own exception handler. * UTLB miss - generated when the access is to kuseg and there is no matching mapping loaded into the TLB. * TLB miss - generated when the access is to kseg0, kseg1, or kseg2 and there is no mapping loaded into TLB. Also generated when the mapping is loaded into TLB, but valid bit is not set. * TLB mod - generated when the mapping is loaded, but access is a write and the D bit is not set. * Here is the TLB lookup algorithm: * If MSB is 1 and in user mode, generate an address error exception. * Is there a VPN match? If no, generate a TLB miss exception if MSB is 1, otherwise generate a UTLB miss. * Does the PID match or is the global bit set? If no, generate a TLB miss (if MSB is 1) or UTLB miss (if MSB is 0). * Is valid bit set? If no, generate a TLB miss. * If D bit is not set and the access is a write, generate a TLB mod exception. * If N bit is set, access memory, otherwise access cache (which may refer access to memory). * The PID field allows multiple processes to share the TLB. What if there was no PID field? The PID field is only 6 bits long. What if create more than 64 user processes? * Manipulating TLB entries. Processor must be able to load new entries into TLB. Basic Mechanism: Two 32 bit TLB registers: TLB EntryHi, TLB EntryLow. Bits are the same as for 64 bit TLB entry. EntryHi register holds current PID that is part of all virtual addresses. Also have an Index register: 6 bits that can be set by software, and a Random register: 6 bit register decremented every clock cycle. Constrained not to point to first 8 entries. * Can load into TLB entry registers under program control, then store contents of Entry registers either to TLB entry to which index register points, or to which random register points. * TLB instructions: * mtc0 - loads one of TLB registers with contents of a general register. * mfc0 - reads one of TLB registers into a general register. * tlbp - probes the TLB to see if an entry matches EntryHi. If so, loads index register with index of TLB entry that matched. If no match, sets upper bit of index register. * tlbr - loads EntryHi and EntryLow with contents of TLB entry that index register points to. * tlbwi - writes TLB entry that index points to with contents of EntryHi and EntryLo registers. * tlbwr - writes TLB entry that random register points to with contents of EntryHi and EntryLo registers. * What happens when there is a UTLB or TLB miss? OS must reload TLB and restart the faulting process. Note - the UTLB and TLB miss exceptions branch to different handlers. * Machine state for exceptions: * EPC register: points to instruction that caused fault, unless faulting instruction was in branch delay slot. If so, points to branch before branch delay slot instruction. Basic idea: when fix up exception and return to user code, will branch to EPC. * Cause register. Tells what caused exception, and maintains some state about interrupts. * Status register. Contains information about status of machine. Important bits: Kernel/User mode bit, Interrupt Enable bit. OS maintains a 3 deep stack of these bits, shifting them over on an exception. So, can take two exceptions without having to extract and store the bits. * BadVaddr register - stores virtual address that caused last exception. * Context register. Upper 11 bits - set under program control. Next 19 bits - set to VPN of address that caused exception (omits top bit). Last 2 bits - always 0. * What does machine do on a UTLB miss? * Sets EPC register. * Sets Cause register. * Sets Status register. Shifts K/U and IE bits over one, and clears current Kernel/User and Interrupt Enable bits. So - processor is in kernel mode with interrupts turned off. * Sets BadVaddr register - stores virtual address that caused exception. * Sets Context register. Upper 11 bits - left alone. Next 19 bits - set to VPN of address that caused exception (omits top bit). * Sets TLB EntryHi register to contain VPN of faulting address. * What does OS do in UTLB handler? * Store EPC register to kt1 register (software convention, OS has two registers reserved for its use). * Load context register into kt0 register. * Load contents of memory address that kt0 points to into kt0. Into what part of address space does kt0 point? * Load kt0 into entry low TLB register. * Load TLB entry registers into TLB entry that random register points to. * JR kt1; rfe instruction in branch delay slot. rfe instruction pops bits in Status Register. * What is going on? OS uses a linear page table for each process, starting at address stored in upper 11 bits of context register. Each page table entry is the 32 lower bits of a TLB entry. So, OS just fetched the TLB entry and stored it into a random location in TLB, then started up the program again. * What are upper two bits of context register? 11 - so, this is kernel memory that is mapped separately for each process. Next 9 bits are base of page table in mapped, process-specific kernel space. So, each process has its own page table. * Error cases: * What if page is not in memory? Then OS will store a zero in the valid bit of page table entry. Program will reexecute faulting instruction, generating a TLB miss exception. (NOT a UTLB miss exception). * What if address is out of bounds? OS stores nothing above the page table in address space, so will get a TLB miss (NOT a UTLB miss). This generates a double fault that OS will handle in general exception handler. * What if page table page is not mapped or it is not in memory? Another double fault. * What if faulting instruction was in a branch delay slot? EPC points to branch, so will reexecute branch instruction. No problem - in R3000, all branch instructions are reexecutable with same effect. * What if are inside kernel when take a UTLB miss? How does miss handler know which state to return to? Is stored automatically in Status register, and manipulated by exceptions and rfe instruction. * R3000 carefully designed to support this efficient UTLB reload mechanism. * Some kernel addresses are mapped differently for different processes. So, can store per-process page tables there. * TLB entry format laid out so that it matches possible page table entries. * All branches are restartable - they do not depend on machine state. * UTLB miss handler is in a different location than normal exception handler - supports fast code. Don't have to decode cause of exception. * Sets context register appropriately. * Supports three levels of Kernel/User mode and interrupt enable/disable bits, so can take two faults in a row without needing to save state. Supports double fault mechanism for fast handling of uncommon cases. * What must OS do when it switches contexts? Must set EntryHi of TLB to contain current process id. Must also load top 11 bits of context register with page table base. * What happens on a TLB miss (as opposed to a UTLB miss). * Sets cause register - can be TLB mod miss, or TLB miss. TLB mod miss is when TLB entry matches but operation was a store and D bit was not set. * Sets BadVaddr register. * Sets EPC. * Shifts bits in Status register. * Sets context register. * Sets TLB EntryHi register. * Branches to general exception handler (different from UTLB miss handler). * What OS does on TLB miss: * First determine what caused miss. * If TLB mod miss, check to see if process has right to write page. Usually stored in page table entry in one of unused bits. If has can write it, mark physical page as dirty in OS data structures, set D bit in TLB entry and restart process. Note: can't use random register to write TLB entry back in. There is already a TLB entry with the matching VPN and PID. Must use tlbp to load index register with the index of the matching entry, then store new page table entry into Entry Low register, then tlbwi to store new TLB entry back into the TLB. * If TLB caused by double miss from UTLB miss handler, (find this out by seeing of EPC points inside UTLB miss handler), first determine if given address is valid. Determine this by checking if it is within range of process virtual address space. Then determine if page table page is resident. If so, construct mapping for page table page and put it into TLB. Use this mapping to get TLB entry for page. Insert this entry into TLB and return to user program. * If page table page not resident (this is for a normal kernel TLB miss in per-process kernel space), read it in from disk. When page arrives, set up page table entry. Set valid bit, make PFN point to page frame where it was read in. Clear D bit. Map page table page into TLB and proceed as above. * If TLB caused by kernel mode reference not mapped, find the PTE for the kernel address and put it into TLB, using entry hi and lo registers and random register. Return to code that caused miss. * If miss caused by reference to invalid page in user address space, read page in from disk, set valid bit and PFN in page table. Also, be sure to clear D bit so that any write reference will cause a trap. The OS will use this trap to mark the PFN dirty. * Alternatives: * The OS may run completely unmapped. Problem: can't page OS data structures. * The OS may have a separate address space from user. Problem: can't as easily access user space. * Cache may be virtually addressed. Problem: can't map same memory in different addresses in same process. How do implement Unix mmap facility, which demands that different virtual addresses map to same physical address? Also, if cache does not have PID field, must flush cache on context switch (!). * TLB may not have PID field. Problem: must flush TLB on context switch. * TLB reload may be done automatically in hardware. Each page table entry is 32 bits (lower 32 bits of TLB entry above), and hardware will automatically reload TLB. Context switch must load page table base and bounds registers. Problem: locks OS into using a specific data structure for page table. ---------------------------------------------------------------------- Permission is granted to copy and distribute this material for educational purposes only, provided that the following credit line is included: "Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard." Permission is granted to alter and distribute this material provided that the following credit line is included: "Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard." Martin Rinard, rinard@lcs.mit.edu, www.cag.lcs.mit.edu/~rinard 8/22/1998