The floating-point unit (FPU) in the synergistic processor element
(SPE) of a CELL processor is a fully pipelined 4-way
single-instruction multiple-data (SIMD) unit designed to accelerate
media and data streaming with 128-bit operands. It supports 32-bit
single-precision floating-point and 16-bit integer operands with two
different latencies, six-cycle and seven-cycle, with 11 FO4 delay per
stage. The FPU optimizes the performance of critical single-precision
multiply-add operations. Since exact rounding, exceptions, and de-norm
number handling are not important to multimedia applications, IEEE
correctness on the single-precision floating-point numbers is
sacrificed for performance and simple design. It employs fine-grained
clock gating for power saving. The design has 768K transistors in 1.3
mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have
been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering
44.8 GFlops. Architecture, logic, circuits, and integration are
codesigned to meet the performance, power, and area goals.

    Source: geocities.com/de/christian_jacobi/publications

               ( geocities.com/de/christian_jacobi)                   ( geocities.com/de)