DIVISION OF ENGINEERING AND APPLIED SCIENCES
HARVARD UNIVERSITY

CS 161. Operating Systems

Matt Welsh
Spring 2005

MIPS r2000/r3000 Architecture
Architecture/assembler summary

[CS161 Home Page]

(This is not intended to be either a comprehensive reference or a tutorial. More information is available from www.mips.com.)


Registers

There are 32 general-purpose registers and 3 special registers on the MIPS r2k itself. There are also up to 32 registers each on up to four coprocessors. For CS161 purposes, there is only one coprocessor, coprocessor 0, which is the "system coprocessor"; it takes care of exceptions and virtual memory issues.

Register Symbolic
name
Save
by
Description
General registers
$0 z0, ZERO N/A Always contains 0, no matter what's written to it.
$1 AT caller Assembler temporary. See below.
$2 v0 caller Value 0. Used for computations; function return value is placed here. Also holds the system call number on syscall entry.
$3 v1 caller Value 1. Used for computations; upper word of 64-bit return value is placed here.
$4 a0 caller Argument 0. First function argument goes here.
$5 a1 caller Argument 1. Second function argument goes here.
$6 a2 caller Argument 2. Third function argument goes here.
$7 a3 caller Argument 3. Fourth function argument goes here. Also used as a flag value on system call return.
$8 t0 caller General-purpose temporary register.
$9 t1 caller General-purpose temporary register.
$10 t2 caller General-purpose temporary register.
$11 t3 caller General-purpose temporary register.
$12 t4 caller General-purpose temporary register.
$13 t5 caller General-purpose temporary register.
$14 t6 caller General-purpose temporary register.
$15 t7 caller General-purpose temporary register.
$16 s0 callee General-purpose saved register.
$17 s1 callee General-purpose saved register.
$18 s2 callee General-purpose saved register.
$19 s3 callee General-purpose saved register.
$20 s4 callee General-purpose saved register.
$21 s5 callee General-purpose saved register.
$22 s6 callee General-purpose saved register.
$23 s7 callee General-purpose saved register.
$24 t8 caller General-purpose temporary register.
$25 t9 caller General-purpose temporary register.
$26 k0 nobody Kernel scratch register.
$27 k1 nobody Kernel scratch register.
$28 gp global Global pointer. Constant for any given process.
$29 sp N/A Stack pointer.
$30 s8 callee Saved register #8 - conventionally, but not always, a frame pointer.
$31 ra caller Return address of function.
Special registers
HI - caller High-order word of 64-bit multiply result, or remainder of divide result.
LO - caller Low-order word of 64-bit multiply result, or quotient of divide result.
PC - N/A Program counter.
Coprocessor 0
cop0 $0 c0_index N/A TLB entry index register.
cop0 $1 c0_random N/A TLB randomized access register.
cop0 $2 c0_entrylo N/A Low-order word of "current" TLB entry.
cop0 $4 c0_context N/A Page-table lookup address.
cop0 $8 c0_vaddr N/A Virtual address associated with certain exceptions.
cop0 $10 c0_entryhi N/A High-order word of "current" TLB entry.
cop0 $0 c0_status N/A Processor status register.
cop0 $13 c0_cause N/A Exception cause register.
cop0 $14 c0_epc N/A PC at which exception occurred.
Any of the 32 general-purpose registers can be used in any instruction that takes register operands. The special registers are accessed using special instructions; the coprocessor registers can be accessed by using special coprocessor instructions to move their values to general registers and back.

Register $31 is the "link register". Most of the instructions for calling subroutines are hardwired to store the return address into this register. (The jalr instruction is, for some reason, an exception.)

The coprocessor 0 registers have various bit fields in them. These are:

c0_index
Bits Name Description
31 P Set by the tlbp instruction if the probe fails.
14-30 unused
8-13 Index TLB entry number for tlbwi, tlbr, and tlbp.
0-7 unused
c0_random
Bits Name Description
14-31 unused
8-13 Random Semi-random TLB entry number used by tlbwr. Updated by processor. Never has a value between 0-7.
0-7 unused
c0_entrylo
Bits Name Description
12-31 PFN Physical page number (bits 12-31 of address) for VM mapping.
11 N Non-cacheable; if set, RAM cache is disabled accessing this page.
10 D Dirty; if set, page may be written to.
9 V Valid; if set, page may be accessed.
8 G Global; if set, valid in every address space.
0-7 unused
c0_context
Bits Name Description
21-31 PTEBase Base address of page table. Untouched by hardware; maintained by software.
20-0 BadVPN Offset into page table for a kuseg fault (bits 12-30 of c0_vaddr), set by hardware.
c0_vaddr
Bits Name Description
0-31 vaddr Failing virtual address; set by certain exceptions.
c0_entryhi
Bits Name Description
12-31 VPN Virtual page number (bits 12-31 of address) for VM mapping.
6-11 PID ID of address space in which virtual address exists.
0-5 unused
c0_status
Bits Name Description
28-31 CU If these bits are set, the corresponding coprocessors are usable. If clear, use of said coprocessors will generate a coprocessor unusable exception.
23-27 unused
22 BEV If set the "bootstrap" exception handler addresses are used.
21 TS If set to 1, the processor is dead in the water and needs to be reset.
20 PE Set to 1 if a cache parity error occurs. Clear by writing 1.
19 CM Set to 1 if the most recent data cache load missed, but only if IsC is set.
18 PZ If set to 1, uses space parity for outgoing data.
17 SwC If set, the cache control lines affect the instruction cache rather than the data cache.
16 IsC If set, the data cache is detached from main memory. (For flushing.)
8-15 IntMask While these bits are set, the corresponding interrupts are masked and do not cause interrupt exceptions.
6-7 unused
5 KUo Old kernel/user mode bit (1 = user mode)
4 IEo Old interrupt enable bit (0 = mask all interrupts)
3 KUp Previous kernel/user mode bit (1 = user mode)
2 IEp Previous interrupt enable bit (0 = mask all interrupts)
1 KUc Current kernel/user mode bit (1 = user mode)
0 IEc Current interrupt enable bit (0 = mask all interrupts)
c0_cause
Bits Name Description
31 BD Set if last exception occurred in a branch delay slot.
30 unused
28-29 CE Coprocessor number resulting from a coprocessor unusable exception.
16-27 unused
10-15 IP Bits reflecting the state of the external hardware interrupt lines. Bit 10 is irq 0.
8-9 Sw Software interrupts. Like IP, but controlled by software.
6-7 unused
2-5 ExcCode An exception code, from the list below.
0-1 unused
c0_epc
Bits Name Description
0-31 epc Program counter for restarting after exception.



Instructions

This table uses the following symbols:
RD, RS, RT Up to three general registers ($0-$31)
HI, LO The special "hi" and "lo" registers
HI:LO "hi" and "lo" as a single 64-bit value
C0_REG A coprocessor 0 register
signed-IMM Immediate value IMM, sign-extended to 32 bits
unsigned-IMMImmediate value IMM, zero-extended to 32 bits
offsetBranch or memory-access offset (always signed)
signed-Value is interpreted as signed
unsigned-Value is interpreted as unsigned
addressImmediate address for jump
These are the instructions (there are a few not listed, including all the floating-point operations, but this should include anything we'll see in CS161.)

In the opcode names, "u" means "unsigned"; "i" means immediate; the "al" in some jump instructions means "and link", meaning "function call".

Instruction Operation Notes
add RD, RS, RT RD = RS + RT; exception on overflow
addi RT, RS, IMM RT = RS + signed-IMM; exception on overflow
addiu RT, RS, IMM RT = RS + signed-IMM
addu RD, RS, RT RD = RS + RT
and RD, RS, RT RD = RS & RT
andi RS, RT, IMM RT = RS & unsigned-IMM
beq RS, RT, branch-offset if (RS == RT) NEXTPC += (branch-offset << 2)
bgez RS, branch-offset if (signed-RS <= 0) NEXTPC += (branch-offset << 2)
bgezal RS, branch-offset $31 = NEXTPC; if (signed-RS >= 0) NEXTPC += (branch-offset << 2)
bgtz RS, branch-offset if (signed-RS > 0) NEXTPC += (branch-offset << 2)
blez RS, branch-offset if (signed-RS <= 0) NEXTPC += (branch-offset << 2)
bltz RS, branch-offset if (signed-RS < 0) NEXTPC += (branch-offset << 2)
bltzal RS, branch-offset $31 = NEXTPC; if (signed-RS < 0 NEXTPC += (branch-offset << 2)
bne RS, RT, branch-offset if (RS != RT) NEXTPC += (branch-offset << 2)
break breakpoint (immediate breakpoint exception) with no delay slot
div RS, RT LO = signed-RS / signed-RT; HI = signed-RS % signed-RT
divu RS, RT LO = unsigned-RS / unsigned-RT; HI = unsigned-RS % unsigned-RT
j address NEXTPC = (NEXTPC & 0xf0000000) | (address << 2)
jal address $31 = NEXTPC; NEXTPC = (NEXTPC & 0xf0000000) | (address << 2)
jalr RD, RS RD = NEXTPC; NEXTPC = RS. RD is normally $31.
jr RS NEXTPC = RS
lb RT, offset(RS) RT = signed-8-memory[RS + offset]
lbu RT, offset(RS) RT = unsigned-8-memory[RS + offset]
lh RT, offset(RS) RT = signed-16-memory[RS + offset]
lhu RT, offset(RS) RT = unsigned-16-memory[RS + offset]
lui RT, IMM RT = unsigned-IMM << 16
lw RT, offset(RS) RT = 32-memory[RS + offset]
lwl RT, offset(RS) RT = unaligned-32-memory[RS + offset] 1
lwr RT, offset(RS) RT = unaligned-32-memory[RS + offset] 1
mfc0 RT, C0_REG RT = C0_REG
mfhi RD RD = HI
mflo RD RD = LO
mtc0 RT, C0_REG C0_REG = RT
mthi RS HI = RS
mtlo RS LO = RS
mult RS, RT HI:LO = signed-RS * signed-RT
multu RS, RT HI:LO = unsigned-RS * unsigned-RT
nor RD, RS, RT RD = ~(RS | RT)
or RD, RS, RT RD = RS | RT
ori RT, RS, IMM T = RS | unsigned-IMM
rfe return from exception 2
sb RT, offset(RS) 8-memory[RS + offset] = RT
sh RT, offset(RS) 16-memory[RS + offset] = RT
sll RD, RT, IMM RD = RT << unsigned-IMM
sllv RD, RT, RS RD = RT << RS
slt RD, RS, RT RD = signed-RS < signed-RT
slti RT, RS, IMM RT = signed-RS < signed-IMM
sltiu RT, RS, IMM RT = unsigned-RS < unsigned-signed-IMM
Yes, according to my reference it actually takes the 16-bit immediate, sign-extends it, and then reinterprets it as an unsigned value. Don't ask me.
4
sltu RD, RS, RT RD = unsigned-RS < unsigned-RT
sra RD, RT, IMM RD = signed-RT >> unsigned-IMM
srav RD, RT, RS RD = signed- RT >> RS
srl RD, RT, IMM RD = unsigned-RT >> unsigned-IMM
srlv RD, RT, RS RD = unsigned-RT >> RS
sub RD, RS, RT RD = RS - RT; exception on overflow
subu RD, RS, RT RD = RS - RT
sw RT, offset(RS) 32-memory[RS + offset] = RT
swl RT, offset(RS) unaligned-32-memory[RS + offset] = RT 1
swr RT, offset(RS) unaligned-32-memory[RS + offset] = RT 1
syscall make system call; immediate syscall exception with no delay slot
tlbp probe tlb: search TLB for entry matching c0_entryhi; set probe-failed bit and index field in c0_index. 3
tlbr read tlb entry: load the TLB entry named by the index field of c0_index into c0_entryhi and c0_entrylo. 3
tlbwi write tlb entry indexed: store c0_entryhi and c0_entrylo into the TLB entry named by the index field of c0_index. 3
tlbwr write tlb entry "random": store c0_entryhi and c0_entrylo into the TLB entry named by the index field of c0_random. 3
xor RD, RS, RT RD = RS ^ RT
xori RT, RS, IMM RD = RS ^ unsigned-IMM
Notes:

  1. lwl/lwr and swl/swr are for accessing unaligned words in memory. The actual specification is complicated, but what it boils down to is that
    lwl RT, offset(RS)
    lwr RT, (offset+3)(RS)
    loads the 32-bit value starting at RS+offset, no matter what the alignment of that address is. swl/swr behave analogously.

  2. RFE rotates the lower six bits of the status register by two to the right, so the "previous" interrupt/usermode state becomes the current state and the "old" state is copied into the "previous" state. This inverts what happens on an exception. RFE is normally found in the delay slot of a jump instruction of some kind.

  3. For an explanation of these, see the comments in src/kern/arch/mips/include/tlb.h.

Synthetic instructions

Because all instructions are exactly 32 bits wide, it's not possible to perform certain logical operations in a single instruction. The assembler will cover for these by emitting multiple actual instructions as needed.

For instance, the "lc" (load constant) and "la" (load address) instructions, both of which load 32-bit constants, will be expanded by the assembler into a "lui" instruction to load the upper half of the word, and then usually an "ori" or "addiu" to set the lower half of the word.

Some of these combinations require an extra register to hold intermediate values. Register $1 is reserved for this purpose. You can prevent the assembler from using $1 by putting ".set noat" in the assembler source.


Delay slots

The MIPS is a pipelined architecture, and certain aspects of the pipeline are exposed to the programmer. In general, "slow" instructions are not finished until the instruction *two* spaces after them is being fetched. The instruction in between is referred to as a "delay slot".

There is no pipeline stall logic; the delay slots must be filled out appropriately in the machine code. If they aren't, the behavior is undefined.

The assembler will attempt to fill delay slots for you; however, it isn't very bright about it and usually inserts nops. Also, in some cases it cannot tell what you mean and can silently mangle code that you thought was using delay slots efficiently. For this reason, when coding OS/161, I turned off this behavior with ".set noreorder".

Delay slots apply chiefly to two classes of instructions:

  1. Loads and stores involving memory.
    			lw $9, 0($8)	; load value into $9
    			nop		; $9 won't be ready here
    			addiu $10, $9	; now we can use $9
    
  2. Branches and jumps.
    			jal myfunc	; call function
    			move a0, s0	; executes BEFORE jump happens
    			addiu s0,s0,v0	; executes AFTER function returns
    
The interaction between branch delay slots and exception handling is extremely unpleasant and you'll be happier if you don't think about it.


Exceptions

When an exception occurs, information about the exception is recorded in some of the coprocessor 0 registers and execution contains from a known hardwired address.

The following registers are updated on exception:

Execution continues at a hardwired address, one of the following:
AddressDescription
0x80000000UTLB miss exception
0x80000080Other exceptions
0xbfc00000Processor reset
0xbfc00100UTLB miss exception, if BEV is set in c0_status
0xbfc00180Other exceptions, if BEV is set in c0_status
The exceptions are:
Code Sets
c0_vaddr?
    Description
0 no Interrupt (hardware or software)
1 yes TLB protection fault ("modification request")
2 yes TLB miss or UTLB miss on load or instruction fetch.
3 yes TLB miss or UTLB miss on store.
4 yes Address error on load or instruction fetch.
5 yes Address error on store.
6 no External bus error on instruction fetch
7 no External bus error on data load or store
8 no SYSCALL instruction
9 no BREAK instruction
10 no Reserved (illegal) instruction
11 no Coprocessor unusable
12 no Arithmetic overflow
An address error results from either use of an inadequately aligned pointer (an N-bit quantity must be aligned on an N-bit address boundary, unless the lwl/lwh/swl/swh instructions are used) or an attempt to access kernel memory from user mode.

A TLB entry is "matching" if its VPN field is the same as the page number portion of the virtual address being looked up, and either the G (global) bit is set or the PID field matches the PID field in c0_entryhi.

If no matching TLB entry is found, a TLB miss exception occurs, unless the address is in the user mode range (0-0x80000000) in which case a UTLB exception occurs. If a matching entry is found, but it is not marked valid (the V bit is clear), a TLB miss exception (never a UTLB miss exception) occurs. Then, if the dirty (D) bit is not set on a write access, a TLB protection fault occurs.

A UTLB miss exception uses (potentially) different exception handling code from a TLB miss exception, but is otherwise the same. The purpose, in conjunction with the c0_context register, is to enable fast-path TLB refill handling. Note that the UTLB exception applies to user addresses, not user mode - if the miss address is below 0x80000000, a UTLB exception occurs whether or not the miss was generated in kernel or user mode.


Segments

The MIPS divides its address space into several regions that have hardwired properties. These are: Both direct-mapped segments map to the first 512 megabytes of the physical address space.

The top of kuseg is 0x80000000. The top of kseg0 is 0xa0000000, and the top of kseg1 is 0xc0000000.

The memory map thus looks like this:

Address Segment Special properties
0xffffffff kseg2  
0xc0000000  
0xbfffffff kseg1  
0xbfc00180 Exception address if BEV set.
0xbfc00100 UTLB exception address if BEV set.
0xbfc00000 Execution begins here after processor reset.
0xa0000000  
0x9fffffff kseg0  
0x80000080 Exception address if BEV not set.
0x80000000 UTLB exception address if BEV not set.
0x7fffffff kuseg  
0x00000000