Jamil Khatib
Floating Point Unit project
Floating Point Unit project
(C) Copyright 2000 Jamil Khatib.
Contents
1 List of authors and changes
2 Project Definition
2.1 Introduction
2.2 Objectives
3 Specifications
3.1 System Specification
3.2 External Interfaces
3.3 Hardware specification
3.4 Software specification
3.5 Interface between SW and HW
4 Internal Blocks
5 Design description
5.1 Decode Unit
5.1.1 Design notes
5.1.2 Timing and flow charts
5.2 Adder/Subtracter Unit
5.2.1 Design notes
5.2.2 Timing and flow charts
5.3 Round/Normalize Unit
5.3.1 Design Notes
5.3.2 Timing and flow charts
5.4 Exceptions generation
5.4.1 Design Notes
5.4.2 Timing and flow charts
5.5 Compare Unit
5.5.1 Design notes
5.5.2 Timing and flow charts
5.6 Scripts, files and any other information
5.7 Design conventions and coding styles
5.8 Design Modeling
5.9 Integration notes
6 Testing and verifications
6.1 Simulation and Test benches
6.2 Specifications of test benches
6.3 verification techniques and algorithms
6.3.1 Verification Software
6.4 Test plans
6.4.1 Adder/Subtractor unit
7 Implementations
8 Reviews and comments
9 References
1 List of authors and changes
Name | Changes | Date | Contact address |
|
Jamil Khatib | Initial release | 17-7-2000 | khatib@opencores.org |
Jamil Khatib | Tests and Interfaces added | 13-8-2000 | khatib@opencores.org |
Jamil Khatib | Design spec changed to generic CPU interface | 23-10-2000 | khatib@opencores.org |
Jamil Khatib | Major Changes and architecture enhancment | 3-11-2000 | khatib@opencores.org |
Jamil Khatib | General Review | 17-12-2000 | khatib@opencores.org |
Jamil Khatib | Verification section is added | 25-12-2000 | khatib@opencores.org |
2 Project Definition
2.1 Introduction
Floating point numbers and calculations have great use in every day calculations like banks transactions, scientific calculations, graphics, egineering drawings and even games. These kind of calculations will be very solw if they are done without any hw support. This floating point unit core should deliver speed up the calculations. The hardware core and software libraries is going to be provided.
2.2 Objectives
The main objective of this project is to build an IEEE-754 compatible Floating-point unit core and its software. This core should give high performance and have the ability to interface to any CPU core. The project should provide a set of small calculation units. The final step should use the fpu core in a stand alon floating point processor.
3 Specifications
3.1 System Specification
- The core should be compliant with the IEEE-754 standard.
- The core is going to have several phases one phase for each precision.
- The first phase should provide the functionality then optimization for speed and area should be considred.
- The Software part should provid the compiler, assembler, traps handling, higher precisions and extra calculations done in compiler libraries.
- Supported instructions TBD. web site has some suggestions.
- A stand alone co-processor has to be built later.
- The functional units have to operate as a stand alone calculation units
- The FPU core assumes that the CPU has to issue the instructions and know when the result should come out. (not valid in the stand alone mode)
- The FPU has no storage registers, all values and results should be stored in the CPU. (not valid in the stand alone mode)
- The CPU should decide whether can issue FP instructions or not (detect hazards etc.). (not valid in the stand alone mode)
- One instruction is executed at a time (in future more instructions should be executed in parallel)
- This document will describe only the execution core unit, The stand alone mode will be described in the future.
3.2 External Interfaces
- The core should be as an execution unit of the cpu or a co-processor.
- The core should give exceptions to the CPU to store them.
- The CPU must have the ability to mask and unmask the exceptions as described by the IEEE-754 standard. On the other hand, the FPU should generate the exceptions all time.
- The FPU should provide a signal when the result is ready.
- The CPU should provide a signal to load new instruction and operands.
- Check the FPU interface table
FPU interface (for single precision)
Direction | Size | Name |
INPUT | [31:0] | operand A |
INPUT | [31:0] | operand B |
INPUT | [TBD:0] | operation type |
INPUT | [1:0] | rounding mode: |
| | 0 - To nearest even (DEFAULT) |
| | 1 - To +¥(round up) |
| | 2 - To -¥(round down) |
| | 3 - To ZERO |
OUTPUT | [31:0] | result |
| | Compare Result (Conditions and special numbers) |
OUTPUT | 1 | larger then ( A > B ) |
OUTPUT | 1 | smaller then (A < B) |
OUTPUT | 1 | equal ( A == B ) |
OUTPUT | 1 | unordered ( A == NAN | B == NAN ) |
OUTPUT | 1 | infinity ( A == INF | B == INF ) |
OUTPUT | 1 | Zero (A == zero) |
| | Exceptions |
OUTPUT | 1 | overflow |
OUTPUT | 1 | underflow |
OUTPUT | 1 | divide by zero |
OUTPUT | 1 | SNAN: Invalid operation |
OUTPUT | 1 | INE: inexact |
| | Control |
INPUT | 1 | Load |
UTPUT | 1 | ready |
3.3 Hardware specification
- Refere to the system specification section
3.4 Software specification
- The user should have the ability to read and mask the flags.
- The flags should remain set as long as the software do not unset them.
- The CPU should be interrupted upon any generation of unmasked exception.
- The software should handle the flags to prevent frequent generation of interrupts and it also trap and handle all exceptions and modify the results upon the source of exception.
3.5 Interface between SW and HW
The interface between SW and HW depends on the used CPU and software platform. Instructions definitions should be easliy customizable. All FPU exceptions can derive external interrupt controller that provides interrupts to the CPU. The system should not be platform dependent.
4 Internal Blocks
The hardware can be divided into the following main blocks:
Figure
- Decode unit
- It decodes numbers and recovers the hidden bit.
- It should detect Special numbers and generate appropriate special numbers signals.
- It detects NANs, infinity and zero.
- Execution Units
- All blocks should take the operands in decoded format
- They should have the ability to take de-normalized inputs.
- They can generate de-normalized results.
- They should have the ability to take special values and know how to handel them.
- They should expand the operands in the proper way to do the calculation in infinte precission.
- They must produce a defined size of the result for the round unit.
- Execution units are composed of Add/Subtract, Multiply, Divide, Compare, square root, and some transidantal units (TBD)
- Round, Normalize Unit
- This unit is used for all execution units
- It performs the Pre and Post normalization.
- Rounding must be done from the internal precision to the standard precision accorind to the rounding mode.
- Supported rounding mode are nearest even, To +¥(round up), To -¥(round down) and To ZERO.
- This unit should consider both normalized and de-normalized results.
- Compare Unit
- This block runs in parallel with the excution units.
- It is reponsible for generating all conditioning signals.
- It will detect larger than, smaller than, equal conditions, NAN, +¥ (infinity) and zero.
- Encode Unit
- It converts the rounded results into IEEE format and removes the hidden bit.
- It should set special no. based on the operands and the operation.
- Exceptions Unit
- It should generate all 5 exceptions.
- It should get status from other blocks if it needs to perform its operation.
- integer to floating and flowting to integer Unit
TBD
5 Design description
5.1 Decode Unit
5.1.1 Design notes
- It recovers the hidden bit by checking the exponent bits in parallel.
- It extracs the no. type (SNAN,QNAN,ZERO,INF or normal) by checking all bits of the exponent and fraction.
- it expands the fraction part and the exponent by one bit for each.
5.1.2 Timing and flow charts
TBD
5.2 Adder/Subtracter Unit
three bits in the most right of the no. (Guard, round and sticky)
One complementer can be used and do a swap
Two exponent subtractions can be done e1-e2 and e2-e1 so as not to use a complement and to get a negative value
5.2.1 Design notes
- Invert the sign of the second operator in case of subtraction.
- Calculates the difference between the two operands' exponents.
- The diff is calculated using normal sign magnitude subtraction. ``This can be optimized by using a single complementer and perform a swaping for the negative number and check the small operand''
-
- Check if the one of the operands is a special number (NaN or Infinity)
- If NaN is one of the operands the result is NaN
- If ¥ check if the operation is ¥-¥ then NaN and Invalid exception is generated
- The operands can be non-normalized and also the results
- The unit needs two cycles to get the result.
5.2.2 Timing and flow charts
5.3 Round/Normalize Unit
5.3.1 Design Notes
5.3.2 Timing and flow charts
5.4 Exceptions generation
5.4.1 Design Notes
5.4.2 Timing and flow charts
5.5 Compare Unit
5.5.1 Design notes
TBD
5.5.2 Timing and flow charts
TBD
5.6 Scripts, files and any other information
The system has been implemented using the VHDL language
5.7 Design conventions and coding styles
5.8 Design Modeling
5.9 Integration notes
6 Testing and verifications
Requeirement | Test method | Validation method |
|
Interface timing | | |
| The external interface must drive | The test bench should |
| Load signal with each new operation | check the sequence of signals |
| It should sheck the Done signal that must | according to number of clocks |
| be driven when valid result is stable | |
|
Functionality | | |
6.1 Simulation and Test benches
A test bech should be provided in order to check the validty of both the timing and the functionality of the FPU core.
6.2 Specifications of test benches
The system test bench should be composed of two parts. The data injection and checking part (called client later on) and the timing checking part (called server later on). This approach is seleced in order to have the ability to change the interface of the FPU and check it without the need to check the functionality of the system.
6.3 verification techniques and algorithms
The functional verifications will be made in two steps
- Through SW verification by set of test vectors and checking it versus IEEE-754 compliant software.SoftFloat Library can be used
- Through prototyping on FPGA and testing it on real operations and calculations. TBD
6.3.1 Verification Software
- The software should generate test vectors for the test bench in acceptable readable format for the test bench in VHDL language.
- It should generate numbers and operations randomly for the test bench.
- It should to read mathematical formulae and generate sequence of numbers and operations to the test bench.
- There should be a method to compare the results from the real calculations done on the CPU and from the fpu core.
- All results from test bench and software should be in readable format with the ability to show all bit representations.
- The software should have the ability to generate operations that give special numbers (infinity, zeros and NaN) and all kind of exceptions.
- The software should know when the special numbers are going to be generated and not to calculate them using the normal software flow, else it will fail because the results will be handled by host traps.
6.4 Test plans
Testing should take care of complex operations and well known bugs cased by sequence of operations.
Test plans, equations and operations are TBD
6.4.1 Adder/Subtractor unit
Tests to be run on the add/sub unit:
- The same exponent
- Addition on operands having the same sign
- Addition on operands having different signs
follow these tests
- Operand1 is -ve and op2 is +ve and the result is +ve
- Operand1 is +ve and op2 is -ve and the result is +ve
- Operand1 is -ve and op2 is +ve and the result is -ve
- Operand1 is +ve and op2 is -ve and the result is -ve
- Different set of numbers should be applied
- Different exponents
- Do the same tests as above (all tests in 1)
- let Op1 has the larger exponent
- let Op2 has the larger exponent
- Denormalized numbers
- Op1 should be denormalized and op2 is normalized
- Op2 should be denormalized and op1 is normalized
- Both op1 and op2 should be denormalized
- calculations as in 1 and 2 should be made but the result should be denormalized
- Special numbers
- NAN should be one of the operands.
- both operands should be NAN
- ±¥ should be one of the operands
- Both operands should be + ¥ or - ¥
- Operands should be NAN and ¥
- op1 should be +¥ and op2 should be -¥
- op2 should be +¥ and op1 should be -¥
- Do the same above tests (in 1 & 2) on the subtraction operation.
- Do the same tests in 1,2,3 and let the result be very large to overflow or very small to underflow
7 Implementations
8 Reviews and comments
9 References
- Introduction to Floating Point arithmetic by Jamil Khatib
- ``http://www.oocities.org/SiliconValley/Pines/6639/docs/fp_summary.html''
- OpenCores FPU core
File translated from
TEX
by
TTH,
version 2.67.
On 25 Dec 2000, 16:14.