Although RTR applications provides much enhancements over the static designs, but they are still in the research area. RTR is suitable for many applications specially those need much computations and depend on parallel methods such as DSP applications, image processing, neural networks, encryption, compression, pattern matching, motion and target tracking and even general purpose processors.
The hardware of DISC processor is divided into two parts: general ``weak'' processor core that has small number of instructions and addressing modes, and custom instructions area where the new instructions are loaded. The FPGA hardware space is formed of rows, each custom instruction circuit must occupy integer number of rows to speed up hardware relocation. The communication lines are extended horizontally and reach all rows which ease the communication between each instruction and the processor core without interfering with other instructions.Although the idea of rows has good advantages, it reduces the resources utilization, refer to figure 2.
The DISC processor uses the idea of instruction caching to reduce the configuration overhead by preloading instructions. When new instruction is requested, the core checks if it is already loaded,then no actions are needed, else it reconfigures the hardware and loads the new instruction. The efficiency of the caching depends on the flow of the program and the hardware module removal algorithms. For more information about the DISC project refer to [17,18,16].
The challenge to the ATR is the rapid comparison of an input image to thousands of templates with large number of pixels per image [14]. General purpose computing devices are used to increase the flexibility of the system and enables multiple templates to be checked, but with low performance. In contrast, dedicated hardware for special templates can be used to provide higher performance but without the flexibility of changing the templates. RTR logic can be used to combine between the advantages of the both approaches by building custom circuits optimized to each template and reconfigured on the fly to different templates.
Tree adders are used for each pixel to perform the correlation between the input image and the template. To further improve the performance, correlation between the templates can be calculated during the design time to get the common pixels between them to use the same adder tree for the correlated templates' pixels. So it is important to do templates partitioning during the design time and group the correlated pixels in sets that use the same circuits, in this way the use of the hardware is minimized with improved performance while keeping the flexibility of the circuits. For more details refer to [13].