For your reading enjoyment! Pleasant dreams.... ;) alex; ---- CMPE110 Winter 2002, modified from CMPE110 Spring 2001 Informal Final Review Topics Performance Performance = 1 / Execution time Speedup = Told / Tnew Amdahl's Law: Make the common case fast! Fe = fraction of operation (what percentage of the time) Se = speedup (how many times faster) To = old time Tn = new time, so Tn = To ( 1 - Fe ) + To/Se Fe , or Sn = To/Tn = 1/ (1 - Fe + Fe/Se) CPI (clock cycles per instruction) tex = execution time fck = clock frequency I = instruction count CPI = tex * fck / I MIPS (million instructions per second) MIPS = I / (tex * 106) MIPS = fck / CPI * 106 > Native MIPS: average CPI > Relative MIPS: relative to another machine > Peak MIPS: minimizing CPI (best case) MOPS (million operations per second) FLOPS (floating point operations per second) Benchmarks A benchmark is a standard way to quantify the performance of a machine. Run real applications or "synthetic" programs to get nice numbers. Arithmetic Mean (average) Sum up all n values, divide by n. Weighted Arithmetic Mean Multiply each value n by weight w. Add. Geometric Mean Multiply all n values, take the nth root. Architectures Stack (all operands are kept on a stack) Accumulator (one operand is in accumulator) General Purpose Register (load/store) Little-endian (word address is at LSByte) Big-endian (word address is at MSByte) MIPS Instructions MIPS is a RISC machine (CPI~1) with a load/store architecture 32 registers, 32-bit (4-byte) words Instruction Types R-type (add, xor, jr) opcode(6)| rs(5) | rt(5) | rd(5) | shamt(5) | func(6) I-type (addi, lui, lw, sw, beq) opcode(6)| rs(5) | rt(5) | immediate(16) J-type (j, jal) opcode(6)| target(26) Addressing Modes Register Specify register number add $t0, $t1, $t2 Immediate Operand embedded in instruction (16 bits only!) addi $t0, $t1, 77 Base (Displacement) Add register and immediate lw $s0, 12($s1) PC-Relative Add immediate (and the "understood zeros") to PC beq $0, $t3, -44 Indirect Register has address jr, jalr Pseudodirect Add 26-bit immediate to top 4 bits of PC j, jal Direct (NOT IN MIPS): Address is the immediate Indexed (NOT IN MIPS): Address is sum of two registers Update (NOT IN MIPS): Base addressing, plus automatically increment register Memory Indirect (NOT IN MIPS):Register points to memory location, which points to operand Number Conversion Given a number in base X, can you convert it to base Y? With a radix point? Binary (see new 12c lab manual) Unsigned: cannot represent negative numbers Sign-Magnitude: MSb is sign bit One's Complement: if negative, flip all bits Two's Complement: if negative, add one to 1's comp Bias (or Excess): to get in, add bias to number to get out, subtract bias from number Octal: group binary into threes Hex: group binary into fours Logic Design Truth tables, logical operations Simple logic: and, or, not, xor, mux Propagation delay (how long it takes for the device to give output) Simple CMOS circuits Making Hardware Add Half-adder (works only with inputs a,b --- not carry-in) Full adder (1 bit) Ripple-carry (slow) Adds serially; you don't know the result until all bits are finished Carry lookahead (faster) Adds in parallel; uses generate and propagate equations to compute carry-ins gi = ai * bi pi = ai + bi Carry select Adds both with Cin=0 and =1, then figures out what you really wanted Two-level logic (fastest, but impractical) Manchester carry chain The ALU 1. Logical operations 2. Full adder (one that works with carries) 3. Add comparison 4. Ripple-carry ALU Multiplication Schemes Multiplication in Hardware Let A=muliplicand; B=multiplier; P=product P=(A * LSb B); shift B right, shift A left Booth's Algorithm Remember to sign-extend partial products to 2n bits!! For radix 2, 10=-A, 01=+A For radix 4, ...? Everything else does nothing (except shift) Division Schemes Division in Hardware Restoring Non-restoring For signed division, make operands positive and remember to change the sign later Floating Point IEEE 754 Single-Precision Standard: S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF 1 8 23 S is 1 bit, E is in bias-127, F is unsigned. FP numbers are in the form (S) 1.F x 2E The "1." is the hidden bit -- it is always 1 for a normalized number ..Which means you MUST have your number normalized to convert to FP Rounding Modes To +inf To -inf To 0 To nearest/even Rounding Bits Guard Round Sticky Single-cycle CPU Each instruction takes one cycle Clock period is propagation delay of the longest instruction (LW) Multi-cycle CPU IF, ID, EX, MEM, WB In our system, R-type instructions take 4 cycles (no MEM); J-type take 2 cycles; Branch takes 3; SW takes 4 (no WB); LW takes 5. Clock period is longest stage (usually MEM) Pipelined CPU See newsgroup post Clock period is longest stage (usually MEM) Cache Main parts of a cache: index (the address; don't count it when you're seeing how many bits total in your cache), valid, tag, data Main parts of a lookup: tag, index, offset (byte and/or word) if needed 2^n lines in your cache need n bits to address them (index) If data comes in 2^n-byte blocks, you need n offset bits in your lookup Cache miss = bad Other topics See notes trinary alex; fire@soe.ucsc.edu