Mr.
Raza Ur Raheem
M. Omer Sheikh: 2003-02-0129
Thursday
Quantitative Study of the Impact of Design and Synthesis
Options on Processor Core Performance.
By Tomás Bautista and Antonio Núnez
Summary:
The purpose of this research was to study, within digital domain, the impact of real estate, clock rate, power consumption and other physical implementations functions of the following variables and parameters:
1- Microarchitectural and design decisions for a given instruction set architecture
2- ISA instructions including MAC, FP, register types and special operators (e.g. vector processing elements as accelerators)
3- Register file structure
4- Control of physical design by tool management with emphasis in megacells and datapath granularity decisions at synthesis time and at placement and routing times;
5- Technological scaling.
The open SPARC architecture is chosen for this aim. Two stages were conducted in the design and implementation of processor:
1- Development
of VDHL versions of SPARC v.8 ISA with integer unit (IU), floating point unit
(FPU) and visual instruction set (
2- Development of a VDHL-based synthesis and compilation technique with easily configurable VDHL options, for a flexible synthesis of experimental versions of each design and its parameterized variations.
The paper in its details describes relevant microarchitectural details and design decisions adopted in the implementation of a full set of versions of SPARC v.8 IU as processor cores for embedded systems and a qualitative study of the impact of synthesis and layout techniques and tools on performance and design quality achieved.
The processor core used for the study was the well known SPARC v.8. SPARC has many implementations certified by SPARC International Inc.; hence SPARC compliant processors have quite a complex design which is appropriate to be taken as a reference design for synthesis experiments.
Many relevant decisions had to be taken in implementing the microarchitecture design. These decisions relate to:
- Branches
- Bypass: needed to avoid R-A-W hazards. The mechanism can be set
o By taking data from end of execution stage of the previous instruction to the decode stage
o By taking data from the beginning of write-back stage of the previous stage to the execution stage.
- TADDccTV and TSUBccTV: these SPARC instructions cause a trap when an overflow is caused by two operands.
- Auxiliary Instructions: which assist instructions that require additional execution stages for completion
- CPI figures of the microarchitecture
The defined architecture can be used to build a number of processors and impact of a specific feature can be easily evaluated by comparing two versions, one with the feature and the other one without it as long as the rest of the features remain the same.
Different design alternatives are introduced and a set of versions is developed with the following micro architectural parameters:
1- Number of windows in the register file;
2- Branch Prediction:
a) predict “always taken”
b) predict “taken only if backward”
3- Checking of possible modification of condition codes by the previous instruction on branches
4- Calculation of the address for the next fetch:
a) with just one addition operation in cycle
b) with two additions.
5- Bypass mechanism set as described earlier
Then the layout synthesis is performed based on a set of logical and physical parameters:
1- Logic granularity (ratio of datapath megacells versus standard cells)
2- Physical granularity (options for guiding the physical synthesis)
a) allowing groups of standard cells to be defined for placement
b) allowing physical planning in megacells
3- Plasticity in shape and connectivity of cores
4- Technology scaling from 0.5 to 0.35 micrometers, with the same metal layers and VDD = 3.3 V.
For the design and microarchitecture options established along with two 0.5 and 0.35 micrometer technologies, more that 100 implementations were carried out. Summaries of results are in tables II to IV for 0.35 micrometer and table V summarizes similar data for cores with 2 to 4 windows.
Different experiments were carried out by changing one parameter at a time and giving the experiment an alphanumeric reference label based on criteria on Page 467.
Figure 6 gives some sample bare layouts that give a visual reference of the impact of microarchitectural, logic and physical synthesis options on area, shape, routing and fragmentation of design.
Many qualitative considerations can be analyzed from the quantitative data since they show significant performance variations in area/shape, power and clock rate. The first noticeable fact is that a large spread of performance values is produced by the synthesis and design tools.
The results corresponding to plain processor cores are presented in this paper for the sake of bounding the scope of analysis performed.
Different factors that influenced the results were:
- Use of megacells
- Definitions of groups
- Planning sequencing strategy
- Number of windows
- Insertion of SIMD Structures
- Coexistence with external elements
- Technology
The effect of these factors can be glanced from the graphs of Fig 7-15.
Only
the most relevant effects observed in experiments obtained from modeling,
design and implementation of a complete set of versions of SPARC v.8 IU as a
processor core, its
The impact of microarchitecture and design features as well as the impact of the use of custom and semi custom megacells may be decisive in the performance obtained from and embedded processor design.