Digital Signal Processing on MMX Technology
Abstract:
Algorithmic-level optimization and programming-level optimization are tightly coupled with each other. Many programmers can optimize the implementation of a specific algorithm using MMX technology. However, without algorithmic-level optimization, the speed-up of the optimization will be limited. On the other hand, many algorithm developers can optimize the DSP algorithm in terms of the numbers of operations (multiplications or additions) but without implementation details, the number of operations cannot be directly translated into the number of clock cycles spent in CPU. There are also many algorithms that can accomplish the same task. For the best performance of DSP/multimedia applications on personal computers we should consider algorithm-MMX technology co-optimization.
One way to increase the performance of digital signal processing algorithms is to execute several computations in parallel. MMX technology is one of the techniques that speed up software performance by performing the same operation on multiple data elements in parallel using a single instruction. However, MMX programming and designing DSP algorithms for MMX technology are full of twists and turns. Implementation of digital signal processing algorithms using MMX technology is a mix of art and science. Matching the algorithms to MMX instruction capabilities is the key to extracting the best performance. This chapter covers algorithm design and algorithmic-level optimization for MMX technology. In this chapter, besides showing you how to optimize your code and algorithm from a scientific view, we will also show you how we go about optimizing ours from an artistic perspective.