A More Generalized Generic Instruction Stream
To solve this problem, a more generalized generic instruction stream that carries all the different equations’ solutions within it must be used. In addition, each compute unit would require a table of parameters and coefficient sets corresponding to the four possible conditions of ap and aq for each of the filter strength parameter Bs = 1, 2, 3, and 4 to adapt the generic instruction stream to the local solution as shown in Figure. 3

Figure 3: Coefficient Sets as a Function of Bs, ap and aq
Chart I and Figure 4 show how to calculate the filtered P0 value using the generalized generic instruction stream method on an SIMD signal processor that contains compute units operating with different Bs strengths.

View full size
Chart 1 - Calculating the P0 value

Figure 4: Filtering P0 for Bs = 1..4 flow chart
In summary, by using the method of the generalized generic instruction stream, it is possible to approach the speed-up factor of n in the processing time where n is the number of compute units, with the small exception that there are a few extra steps to complete here that normally are not necessary. However, that is a small price to pay for the n times increase in speed obtained by the ability to use a wide parallel SIMD signal processor to solve one of the most complex blocks of the H.264 decoder.
About the author:
Yosi Stein serves as DSP Principal System Architect/Advanced Technologies Manager, working in the Digital Media Technology Center on the development of Broadband communication and Image compression enhanced instruction set for ANALOG DEVICES fixed point DSP family
Yosi holds a B.S.c in Electrical Engineering -Technion – Israel Institute of Technology. He can be reached at yosi.stein@analog.com .



