Floating Point Unit project

Jamil Khatib

1 List of authors and changes
2 Project Definition
    2.1 Introduction
    2.2 Objectives
3 Specifications
    3.1 System Specification
    3.2 External Interfaces
    3.3 Hardware specification
    3.4 Software specification
    3.5 Interface between SW and HW
4 Internal Blocks
5 Design description
    5.1 Decode Unit
        5.1.1 Design notes
        5.1.2 Timing and flow charts
    5.2 Adder/Subtracter Unit
        5.2.1 Design notes
        5.2.2 Timing and flow charts
    5.3 Round/Normalize Unit
        5.3.1 Design Notes
        5.3.2 Timing and flow charts
    5.4 Exceptions generation
        5.4.1 Design Notes
        5.4.2 Timing and flow charts
    5.5 Compare Unit
        5.5.1 Design notes
        5.5.2 Timing and flow charts
    5.6 Scripts, files and any other information
    5.7 Design conventions and coding styles
    5.8 Design Modeling
    5.9 Integration notes
6 Testing and verifications
    6.1 Simulation and Test benches
    6.2 Specifications of test benches
    6.3 verification techniques and algorithms
        6.3.1 Verification Software
    6.4 Test plans
        6.4.1 Adder/Subtractor unit
7 Implementations
8 Reviews and comments
9 References

1 List of authors and changes

Name Changes Date Contact address

Jamil Khatib Initial release 17-7-2000 khatib@opencores.org

Jamil Khatib Tests and Interfaces added 13-8-2000 khatib@opencores.org

Jamil Khatib Design spec changed to generic CPU interface 23-10-2000 khatib@opencores.org

Jamil Khatib Major Changes and architecture enhancment 3-11-2000 khatib@opencores.org

Jamil Khatib General Review 17-12-2000 khatib@opencores.org

Jamil Khatib Verification section is added 25-12-2000 khatib@opencores.org

2 Project Definition

2.1 Introduction

Floating point numbers and calculations have great use in every day calculations like banks transactions, scientific calculations, graphics, egineering drawings and even games. These kind of calculations will be very solw if they are done without any hw support. This floating point unit core should deliver speed up the calculations. The hardware core and software libraries is going to be provided.

2.2 Objectives

The main objective of this project is to build an IEEE-754 compatible Floating-point unit core and its software. This core should give high performance and have the ability to interface to any CPU core. The project should provide a set of small calculation units. The final step should use the fpu core in a stand alon floating point processor.

3 Specifications

3.1 System Specification

The core should be compliant with the IEEE-754 standard.
The core is going to have several phases one phase for each precision.
The first phase should provide the functionality then optimization for speed and area should be considred.
The Software part should provid the compiler, assembler, traps handling, higher precisions and extra calculations done in compiler libraries.
Supported instructions TBD. web site has some suggestions.
A stand alone co-processor has to be built later.
The functional units have to operate as a stand alone calculation units
The FPU core assumes that the CPU has to issue the instructions and know when the result should come out. (not valid in the stand alone mode)
The FPU has no storage registers, all values and results should be stored in the CPU. (not valid in the stand alone mode)
The CPU should decide whether can issue FP instructions or not (detect hazards etc.). (not valid in the stand alone mode)
One instruction is executed at a time (in future more instructions should be executed in parallel)
This document will describe only the execution core unit, The stand alone mode will be described in the future.

3.2 External Interfaces

The core should be as an execution unit of the cpu or a co-processor.
The core should give exceptions to the CPU to store them.
The CPU must have the ability to mask and unmask the exceptions as described by the IEEE-754 standard. On the other hand, the FPU should generate the exceptions all time.
The FPU should provide a signal when the result is ready.
The CPU should provide a signal to load new instruction and operands.
Check the FPU interface table

FPU interface (for single precision)

Direction Size Name

INPUT [31:0] operand A

INPUT [31:0] operand B

INPUT [TBD:0] operation type

INPUT [1:0] rounding mode:

0 - To nearest even (DEFAULT)

1 - To +Ą(round up)

2 - To -Ą(round down)

3 - To ZERO

OUTPUT [31:0] result

Compare Result (Conditions and special numbers)

OUTPUT 1 larger then ( A > B )

OUTPUT 1 smaller then (A < B)

OUTPUT 1 equal ( A == B )

OUTPUT 1 unordered ( A == NAN | B == NAN )

OUTPUT 1 infinity ( A == INF | B == INF )

OUTPUT 1 Zero (A == zero)

Exceptions

OUTPUT 1 overflow

OUTPUT 1 underflow

OUTPUT 1 divide by zero

OUTPUT 1 SNAN: Invalid operation

OUTPUT 1 INE: inexact

Control

INPUT 1 Load

UTPUT 1 ready

3.3 Hardware specification

Refere to the system specification section

3.4 Software specification

The user should have the ability to read and mask the flags.
The flags should remain set as long as the software do not unset them.
The CPU should be interrupted upon any generation of unmasked exception.
The software should handle the flags to prevent frequent generation of interrupts and it also trap and handle all exceptions and modify the results upon the source of exception.

3.5 Interface between SW and HW

The interface between SW and HW depends on the used CPU and software platform. Instructions definitions should be easliy customizable. All FPU exceptions can derive external interrupt controller that provides interrupts to the CPU. The system should not be platform dependent.

4 Internal Blocks

The hardware can be divided into the following main blocks:

Figure

Decode unit
- It decodes numbers and recovers the hidden bit.
- It should detect Special numbers and generate appropriate special numbers signals.
- It detects NANs, infinity and zero.
Execution Units
- All blocks should take the operands in decoded format
- They should have the ability to take de-normalized inputs.
- They can generate de-normalized results.
- They should have the ability to take special values and know how to handel them.
- They should expand the operands in the proper way to do the calculation in infinte precission.
- They must produce a defined size of the result for the round unit.
- Execution units are composed of Add/Subtract, Multiply, Divide, Compare, square root, and some transidantal units (TBD)
Round, Normalize Unit
- This unit is used for all execution units
- It performs the Pre and Post normalization.
- Rounding must be done from the internal precision to the standard precision accorind to the rounding mode.
- Supported rounding mode are nearest even, To +Ą(round up), To -Ą(round down) and To ZERO.
- This unit should consider both normalized and de-normalized results.
Compare Unit
- This block runs in parallel with the excution units.
- It is reponsible for generating all conditioning signals.
- It will detect larger than, smaller than, equal conditions, NAN, +Ą (infinity) and zero.
Encode Unit
- It converts the rounded results into IEEE format and removes the hidden bit.
- It should set special no. based on the operands and the operation.
Exceptions Unit
- It should generate all 5 exceptions.
- It should get status from other blocks if it needs to perform its operation.
integer to floating and flowting to integer Unit TBD

5 Design description

5.1 Decode Unit

5.1.1 Design notes

It recovers the hidden bit by checking the exponent bits in parallel.
It extracs the no. type (SNAN,QNAN,ZERO,INF or normal) by checking all bits of the exponent and fraction.
it expands the fraction part and the exponent by one bit for each.

5.1.2 Timing and flow charts

TBD

5.2 Adder/Subtracter Unit

three bits in the most right of the no. (Guard, round and sticky)

One complementer can be used and do a swap Two exponent subtractions can be done e1-e2 and e2-e1 so as not to use a complement and to get a negative value

5.2.1 Design notes

Invert the sign of the second operator in case of subtraction.
Calculates the difference between the two operands' exponents.
The diff is calculated using normal sign magnitude subtraction. ``This can be optimized by using a single complementer and perform a swaping for the negative number and check the small operand''
Check if the one of the operands is a special number (NaN or Infinity)
If NaN is one of the operands the result is NaN
If Ą check if the operation is Ą-Ą then NaN and Invalid exception is generated
The operands can be non-normalized and also the results
The unit needs two cycles to get the result.

5.2.2 Timing and flow charts

5.3 Round/Normalize Unit

5.3.1 Design Notes

5.3.2 Timing and flow charts

5.4 Exceptions generation

5.4.1 Design Notes

5.4.2 Timing and flow charts

5.5 Compare Unit

5.5.1 Design notes

TBD

5.5.2 Timing and flow charts

TBD

5.6 Scripts, files and any other information

The system has been implemented using the VHDL language

5.7 Design conventions and coding styles

5.8 Design Modeling

5.9 Integration notes

6 Testing and verifications

Requeirement	Test method	Validation method

Interface timing
	The external interface must drive	The test bench should
	Load signal with each new operation	check the sequence of signals
	It should sheck the Done signal that must	according to number of clocks
	be driven when valid result is stable

Functionality

6.1 Simulation and Test benches

A test bech should be provided in order to check the validty of both the timing and the functionality of the FPU core.

6.2 Specifications of test benches

The system test bench should be composed of two parts. The data injection and checking part (called client later on) and the timing checking part (called server later on). This approach is seleced in order to have the ability to change the interface of the FPU and check it without the need to check the functionality of the system.

6.3 verification techniques and algorithms

The functional verifications will be made in two steps

Through SW verification by set of test vectors and checking it versus IEEE-754 compliant software.SoftFloat Library can be used
Through prototyping on FPGA and testing it on real operations and calculations. TBD

6.3.1 Verification Software

The software should generate test vectors for the test bench in acceptable readable format for the test bench in VHDL language.
It should generate numbers and operations randomly for the test bench.
It should to read mathematical formulae and generate sequence of numbers and operations to the test bench.
There should be a method to compare the results from the real calculations done on the CPU and from the fpu core.
All results from test bench and software should be in readable format with the ability to show all bit representations.
The software should have the ability to generate operations that give special numbers (infinity, zeros and NaN) and all kind of exceptions.
The software should know when the special numbers are going to be generated and not to calculate them using the normal software flow, else it will fail because the results will be handled by host traps.

6.4 Test plans

Testing should take care of complex operations and well known bugs cased by sequence of operations.

Test plans, equations and operations are TBD

6.4.1 Adder/Subtractor unit

Tests to be run on the add/sub unit:

The same exponent
1. Addition on operands having the same sign
2. Addition on operands having different signs follow these tests
  - Operand1 is -ve and op2 is +ve and the result is +ve
  - Operand1 is +ve and op2 is -ve and the result is +ve
  - Operand1 is -ve and op2 is +ve and the result is -ve
  - Operand1 is +ve and op2 is -ve and the result is -ve
3. Different set of numbers should be applied
Different exponents
1. Do the same tests as above (all tests in 1)
2. let Op1 has the larger exponent
3. let Op2 has the larger exponent
Denormalized numbers
1. Op1 should be denormalized and op2 is normalized
2. Op2 should be denormalized and op1 is normalized
3. Both op1 and op2 should be denormalized
4. calculations as in 1 and 2 should be made but the result should be denormalized
Special numbers
1. NAN should be one of the operands.
2. both operands should be NAN
3. ±Ą should be one of the operands
4. Both operands should be + Ą or - Ą
5. Operands should be NAN and Ą
6. op1 should be +Ą and op2 should be -Ą
7. op2 should be +Ą and op1 should be -Ą
Do the same above tests (in 1 & 2) on the subtraction operation.
Do the same tests in 1,2,3 and let the result be very large to overflow or very small to underflow

Name	Changes	Date	Contact address

Jamil Khatib	Initial release	17-7-2000	khatib@opencores.org
Jamil Khatib	Tests and Interfaces added	13-8-2000	khatib@opencores.org
Jamil Khatib	Design spec changed to generic CPU interface	23-10-2000	khatib@opencores.org
Jamil Khatib	Major Changes and architecture enhancment	3-11-2000	khatib@opencores.org
Jamil Khatib	General Review	17-12-2000	khatib@opencores.org
Jamil Khatib	Verification section is added	25-12-2000	khatib@opencores.org

Direction	Size	Name
INPUT	[31:0]	operand A
INPUT	[31:0]	operand B
INPUT	[TBD:0]	operation type
INPUT	[1:0]	rounding mode:
		0 - To nearest even (DEFAULT)
		1 - To +Ą(round up)
		2 - To -Ą(round down)
		3 - To ZERO
OUTPUT	[31:0]	result
		Compare Result (Conditions and special numbers)
OUTPUT	1	larger then ( A > B )
OUTPUT	1	smaller then (A < B)
OUTPUT	1	equal ( A == B )
OUTPUT	1	unordered ( A == NAN \| B == NAN )
OUTPUT	1	infinity ( A == INF \| B == INF )
OUTPUT	1	Zero (A == zero)
		Exceptions
OUTPUT	1	overflow
OUTPUT	1	underflow
OUTPUT	1	divide by zero
OUTPUT	1	SNAN: Invalid operation
OUTPUT	1	INE: inexact
		Control
INPUT	1	Load
UTPUT	1	ready

Jamil Khatib

Floating Point Unit project

Contents

1 List of authors and changes

2 Project Definition

2.1 Introduction

2.2 Objectives

3 Specifications

3.1 System Specification

3.2 External Interfaces

3.3 Hardware specification

3.4 Software specification

3.5 Interface between SW and HW

4 Internal Blocks

5 Design description

5.1 Decode Unit

5.1.1 Design notes

5.1.2 Timing and flow charts

5.2 Adder/Subtracter Unit

5.2.1 Design notes

5.2.2 Timing and flow charts

5.3 Round/Normalize Unit

5.3.1 Design Notes

5.3.2 Timing and flow charts

5.4 Exceptions generation

5.4.1 Design Notes

5.4.2 Timing and flow charts

5.5 Compare Unit

5.5.1 Design notes

5.5.2 Timing and flow charts

5.6 Scripts, files and any other information

5.7 Design conventions and coding styles

5.8 Design Modeling

5.9 Integration notes

6 Testing and verifications

6.1 Simulation and Test benches

6.2 Specifications of test benches

6.3 verification techniques and algorithms

6.3.1 Verification Software

6.4 Test plans

6.4.1 Adder/Subtractor unit

7 Implementations

8 Reviews and comments

9 References