Dr. Tom Pittman firstname.lastname@example.org
Taylor 108, 3p.M-F 328-1562
SBU policy is that they own the Syllabus;
other material is the property of the instructor who prepared it.
Tiny Computer Architecture
TinyComputer as Gates
Test Data for Decimal Adder
Final: Dec.12, 1pm
Due Sept.9: Write a program that runs in the Tiny Computer and reads two 1-digit numbers a and b, then prints out your name followed by the difference a-b. Submit an assembly listing of your code.
Due Sept.16: Write a subroutine that runs in the Tiny Computer and does unsigned divide on two (16-bit) numbers. Submit an assembly listing of your code and an execution trace showing calls with at least two different data samples. Extra points will be awarded for the fastest OR shortest algorithm.
Due Oct.4: Rewrite the same divide subroutine to run on the MIPS computer for 32 bits. You may use any MIPS instructions except the built-in DIVIDE. Debug your program using the SPIM simulator on the lab computers.
Due Oct.14: Improve the Sept.16 program by making it 10% faster (5% if already <= 400 cycles). Measure the time (click on the Cycles field in the emulator) after doing these two divides -- 32760/2 and 16384/4096 -- and divide the total by two. If your previous program was not correct, you must make it run correctly first. FWIW, my first cut ran in 352 cycles, and the best I could do (after less than one hour fiddling with it) was 305 cycles (610 for two divides).
Due Oct.25: Design a one-digit decimal add circuit. Check your circuit, for example on the Gates simulator. It should accept two four-bit decimal numbers (values from 0000 to 1001) and produce a correct 4-bit decimal sum. You may use any predefined circuits available on the GateSim, or build it entirely from NANDs, or any combination of the two. (Optional) extra credit for correct carry-in and carry out. Turn in a circuit diagram. A GateSim source listing and trace showing all 100 (or 200) possible valid inputs and corresponding outputs would be nice. See DeciTest for help in generating test data for your adder.
Due Nov.1: Study the "Tiny Computer as Gates" document (see link above), and answer the following questions:
1. What makes the middle four bits of the Program Counter (PCM in the net list) count up? Why does it not count on every instruction?
2. How fast can you overclock this computer before it starts to fail? Be sure to test different instructions and different values in the registers. Which instruction fails first?
3. Read Chapter 5 in Patterson, then propose a multicycle implementation of Tiny Computer so you can drive the clock faster than you can in #2 above. Assuming the instruction mix below and the same gate delays, how much faster than your overclock version would your new computer be? Show your work and assumptions. Do not try to implement your proposal at this time.
ADD 280713 AND 154418 CLR 79433 SHR 40544
STO 458735 OUT 41005 LD 551352 INP 12559
CAL 44944 JMP 217081 JZ 242294 tkn 74811
INC 256240 INV 170160 NEG 170160 Total 2289128
Due Dec.2: Sign up for one of the term projects below, and present your running enhancement to the rest of the class during the last week of classes. You should be prepared to discuss your work-in-progress in class when that topic comes up.
2. Hardware Multiply -- The current implementation does all multiplies in software. Your enhancement would implement a 16-bit multiply instruction using some of the extra bits in the EXT instruction. You need a bit to transfer data from the Accumulator to a (new) Multiplier register, and another bit to do the multiply. You must hold up execution until the multiply finishes.
3. Asynchronous Multiply -- This is like #2, but uses a second Multiplicand register, so multiplication can proceed behind the scenes while other instructions continue to execute.
4. Instruction/Data Memory Overlap -- Separate instruction and data memory into separate RAM chips, so that the next instruction can be fetched while the current one is executing.
5. Memory Interleave -- Speed up memory access by splitting memory into odd/even banks, then look ahead for the next instruction word while the current instruction is executing.
6. Instruction Cache -- Add a fast cache memory to hold up to 8 or 16 instructions, with look-ahead, so the instructions are always ready when the CPU needs them (except for branches).
7. Data Cache -- Add a fast cache memory to hold up to 4 or 8 data words with LRU and lazy-write retirement strategies.
8. Indirect Address -- Use the extra instruction bit to encode an indirect address, so that the address in the instruction points to a second word with the actual memory address when that bit is set.
9. Index Register -- Use the extra instruction bit to encode an index register access, so that the index register gets added to the address in the instruction when that bit is set. Use additional bits in the EXT to transfer the Accumulator to and from the index register.
10. Gate Reduction -- If speed is no object, how much can you reduce the number of gates from the current implementation and still get the same functionality?
11. Pipelined Execution -- Add pipeline stages to the current implementation to speed up the average instruction throughput.
12. Floating-Point Add/Subtract -- Use the extra instruction bit to encode a floating-point Add/Subtract instruction pair (or if you prefer, Add/Negate).
13. Dual Processor -- Put a second CPU into the system with shared memory and appropriate interlocks.
14. Simple MIPS -- Considering only a limited but useful subset of MIPS instructions, build a gate simulation of it. Extra credit for successful implementation.
15. Simple Pentium * (hard) -- Same as #14 above, but implement a subset of the Intel Pentium or x86 instruction set. This is a lot of work, which includes finding the manufacturer's specification and choosing appropriate instructions to implement.
Back to T.Pittman home
Rev. 2002 December 2