CmpE 110

Homework #5   ISA + Single Cycle Datapath

Due : Monday November 3, 2003

You have the honor of designing the instruction set architecture and datapath for the new DiBlas SCPOS processor.  Your design should follow the parameters set below.

  • Register - Register RISC Architecture
  • Single Cycle
  • 3 byte instruction word
  • Instructions to be used: Add, AddI, Sub, SubI, LW, SW, J, JR, BEQ, BNQ, AND, ANDI, OR, ORI, NOP, SLT
  • Separate Instruction memory and Data Memory each is byte addressable and is 64K in size.  Reads and Writes take 3ns each.
  • 16 16-bit registers.  Reads and Writes take 2 ns each.
  • 16-bit ALU that will do addition, subtraction, AND, OR.  2 ns propagation delay
  • Multiplexers, Logic Gates, Sign Extenders, Adders, 3x multiplier, and Shifters are at your disposal, each with a 1ns propagation delay. 

1.) Design the instruction layout for each of our MIPS instruction formats (R, I, J). (2 points)

*Grader for this question, there are several ways to implement this solution.  If their answer differs from the one below, there work should give reasons for the implementation

16 registers = 4 bits for register addressing

16 instructions = 4 bits of opcode or 12 instructions + 4 non-immediate ALU instructions = 4 bits of opcode with 2-4 bits of function code (4 bits could make decoding the function field very simple)

optional fields may be left as empty fields

R Type |4 bit opcode|4 bit Rs|4 bit Rt| 4 bit Rd| 4 bit Shift amount (optional)| 4 bit Function code (optional)|

I Type  | 4 bit opcode | 4 bit Rs | 4 bit Rt | 12 bit immediate field |

J Type* | 4 bit opcode | 4 bits used for nothing | 16 bit direct address Jump field |

*note that since the instructions are 3 bytes long, we cannot use a byte offset, so we must have byte addressing for the instruction cache, and thus specify all 16 bits of the address.

Pseudo direct addressing could be used, but if you truly understand the reasoning for pseudo addressing, it would be pointless here because we can easily implement direct addressing.

2.) Define your Immediate Range (for Branch and Immediate ALU instructions) and your Jump Range (Jump only instruction for J type). (1/2 point each)

 Immediate Range = 212  Instructions = 4096 instructions or ± 2048 instructions from PC + 3 --> 12,288 bytes or 6,144 bytes

Jump Range = 64 KB   because we can address any byte in the instruction cache.  (actually it would be 64KB -3 because if the jump were the first or last instruction we could go to the top/bottom which is 64KB - 3 bytes away)

3.) Draw out your datapath, similar to slide 8-18, titled All Together.  Define each signal vector going between units (like in the slide).  Label each unit appropriately.  Create a control unit (defined as an oval) and draw each of the needed inputs and outputs to control your datapath units, including the ALU.  (4 points)

*Notes: Be careful in implementing Branch and Jump parts of the datapath.  The value in the immediate and jump fields identifies how many instructions to branch/jump to from the Next PC. 

*This implementation will vary, however, it should have this general setup shown in this diagram.  The control Unit is in this diagram. 

**Grader award 2 points for a modest looking datapath, and 2 points for the necessary control unit I/O (make note that 2 of my control outputs are 2-3 bits wide).  Control units may have different names for there inputs/outputs, compare datapaths to see that they have correct control signals for memories, muxes, ALU, and register file.

4.) How short can we make the clock cycle?  Draw a table out showing how many clock cycles are used on each type of unit, for each instruction  (ie a table where each row is an instruction, and each column is a functional unit, and the last column is the total column.)  (3 points)

I did not include the propagation delay for the PC register, so if students include that as any value don't mark it wrong.

% denotes that units are done in parallel to other devices, and its delay is thus ignored (eg. the PC+3, SE, 3x multiplier, and additional adders will always be done in parallel)

Instruction

 Instruction cache

 Register File

ALU

Data Cache

 Multiplexors

Total

Add, Sub, And, Or, NOP, SLT

3ns

2* 3ns

 2ns

0 ns

 2 *1ns + %1ns

13ns

Addi,Subi,Andi,Ori,

3ns

2 * 3ns

2 ns

0 ns

2 * 1 ns

13 ns

BEQ, BNQ (BNE)

3ns

3 ns

2 ns

0 ns

1 ns

9 ns

JR

3 ns

3 ns

0 ns

0 ns

1 ns

7 ns

J

3 ns 

0 ns

0 ns

0 ns

1 ns

4 ns

Load

3 ns

2 * 3 ns

2 ns

3 ns

2 ns

16 ns

Store

3 ns

3 ns

2 ns

3 ns

1 ns

12 ns

** Grader due to this poorly worded question, this table can vary greatly both by assumptions made by the student, and their implementation of the datapath.  However, the major point of this question was to show that the clock cycle length will be dependent on the Load instruction.  So award 1.5 points for the construction of the table with similar rows and columns, and 1.5 points for identifying the load instruction as the longest cycle.