Main / CoursesESHw1

## SFSU Engr 851 - Spring 2013 - Homework 1

**Prof. Seapahn Megerian**

- Name three models that can be used to describe the functionality of an embedded system. For each model, give a simple example and draw the model.
- List five examples of properties of embedded systems that can be captured and represented by a model.
- List three pros and cons for VLIW and three pros and cons for Superscalar architectures and briefly explain your reasoning.
- List three use cases where VLIW would be more suitable than Superscalar.
- List three use cases where Superscalar would be more suitable than VLIW.

- Consider the n-tap FIR example discussed in class:

y_{n}= c_{n}* x_{n}+ c_{n-1}* x_{n-1}+ c_{n-2}* x_{n-2}+ ... + c_{1}* x_{1}

given constants c_{n}... c_{1}and input values x_{i}at time step i. Unless otherwise specified, assume n=4, the number of CPU registers you can use is 128, and that you have a RISC type processor with the following single-cycle instructions:

LOAD R_{i}, M(address)

STORE R_{i}, M(address)

ADD R_{s1}, R_{s2}, R_{d}

MUL R_{s1}, R_{s2}, R_{d}

MOV R_{s}, R_{d}

JMP NAME

JGE R_{i}, NAME

JLE R_{i}, NAME

JE R_{i}, NAME*The conditional jump instructions compare with 0 (i.e. >=0, <=0, ==0).*

- Suppose the current value of x, i.e. x
_{n}, is available through memory-mapped IO at M(1000). In other words, you can access the current value of x by loading from M(1000). The constants are stored in memory location M(1001) ... M(1000+N). Write the set of instructions that execute one whole loop of the FIR filter. You don't have to worry about the timing of when x_{n}is read.*Hint*: you have to decide which registers correspond to your x's. Every time the FIR executes, you load just ONE value for x. So you have to somehow remember the old ones. You can assume initially all registers contain 0 (so no need to initialize x's).

- How many clock cycles does the first loop of your FIR implementation require? This should include the steps needed to load all necessary data items.
- Assume each load/store operation takes 10 clock cycles to execute. How many clock cycles does your FIR require?
- If the processor has a 5-stage pipeline, then how many clock cycles does your FIR require? Remember that now, each pipeline stage executes in 1 clock cycle. Repeat your calculation for the case where memory load/store take 10 instructions.
- Suppose you now can also use a multiply-accumulate (MAC) instruction:

MAC R_{d}, R_{x}, R_{c}

where R_{d}= R_{d}+ R_{c}*R_{x}. Rewrite your FIR implementation to use this new instruction. - How many clock cycles does your new FIR require in the non-pipelined and 5-stage pipelined versions? Assume single-cycle load/stores.

- What is the minimum number of registers required to make your standard and MAC-based implementations work?
- Suppose registers are very expensive. What is the absolute minimum number of registers required to implement a working FIR? Justify your answer by writing an implementation in assembly using the given instructions. You can assume you have as much memory as you need.
- Suppose now your processor can perform two MAC operations in a single clock cycle (no other modifications to the register file or memory access). Write a new implementation in assembly to take advantage of this and calculate how many clock cycles it requires to execute.

- Suppose the current value of x, i.e. x