DLX explained

What is DLX?

DLX is a simple pipeline architecture for CPU. It is mostly used in universities as a model to study pipelining technique.

What is a Pipeline?

In the old days, we looked at a CPU as a monolithic processing unit. Whenever we ask it to do something, we have to wait until it finishes that task before we can proceed to the next one. So if your old CPU takes 10 clock cycles to do multiplication and 1 clock cycle to do summation, you have to wait until the multiplication finish before you can do the summation. The total is 11 cycles (not include fetching and other things it need to do before the actual execution.)

Today, we look at a CPU as a collection of processing units which can execute concurrently (made possible by RISC.) So you can do both multiplication and summation at the same time. The total becomes 10 cycles (9% performance increase! Imagine 1 multiplication and 10 summation.)

So what processing units are there in DLX?

There are not many rules in DLX. The basic units it must have are:

IF - Instruction Fetch unit
This unit fetches instruction out of memory.
ID - Instruction Decode unit
This unit gets instruction from IF and extracts opcode and operand from that instruction. It also retrieve value for the operand that refers to register.
EX - Execution unit
This is where the actual execution happens. It gets all the opcode and operands from ID. You might want more than one unit of EX working in parallel :).
MEM - Memory access unit
This is where memory access takes place. I mean the main memory access. Stall usually happens in this stage because main memory is slower than the CPU. So can you guess what happen when this unit is stalled?
WB - WriteBack unit
This is the only place you can write into registers.
Here is how they are connected: IF - ID - EX - MEM - WB
Instruction propagates through the pipeline from IF to WB.
Usually each unit takes one cycle to operate its task, except the EX and MEM units. You might simplify it and let every unit takes just one cycle, no exception.
There is a fixed format of instruction set. You can find it anywhere from the internet. Basically, you can do only one task per instruction and you can not use value in main memory for execution (you need to load it into register first.)

Danger!!!

There are several type of hazards that can stall the pipeline. For example, we have 2 successive instruction:

load value from memory, put it in register A
add register A to register B

Obviously, we have to wait until WB finishes the first instruction before we can decode the second instruction (stuck at ID.)
This is why we introduce Data Forwarding in. With Data Forwarding, you can forward data whenever it is available back to ID or EX to minimize the stalls. In our example, data would be available when MEM gets value back from main memory.

Another type of stall occurs from branching instruction. If we wait until braching instruction reaches EX, we might have to throw away the instructions already loaded into IF and ID. Why? Because it might happen that we load in a wrong branch. We can move branching unit (detach from EX) back to ID and use some kind of branch prediction to minimize the stall.

If you want to know more details, check up computer architecture books in your favorite libraries!