Download StarCore SC140 Application Development Tutorial
Transcript
Freescale Semiconductor, Inc. Multisample Programming Techniques On the first iteration of the kernel, quad data values are loaded, starting from a double even address. This does not create an alignment problem. However, at the end of the first iteration, the pointer is backed up one, to delete the oldest sample. On the next iteration, the pointer is not at a double even address and the quad data load is not aligned. A solution to the alignment problem is to reduce the number of operands moved on each data bus. This eases the alignment issue. However, to maintain the same operand bandwidth, each loaded operand must be used multiple times. Freescale Semiconductor, Inc... This is a situation in which multisample processing is useful. As the number of samples per iteration increases, more operands are reused and the number of moves per sample is reduced. With fewer moves per sample, the number of memory loads is decreased, allowing fewer operands per bus and the data to be loaded with fewer restrictions on alignment. 5.1.1 Computing Memory Bandwidth and Computation Time Determining memory bandwidth and computation time (instructions) is not obvious because kernels may compute multiple samples simultaneously. The number of instructions per sample (ins/sample) is computed, as shown below: Instructions InstructionsInABasicKernel × LoopPassesInAnIteration -------------------------------- = ------------------------------------------------------------------------------------------------------------------------------------------------------Sample NumberOfSamplesProcessedInAnIteration The number of instructions per sample is a direct measure of computation time. The lower this number, the fewer instructions that the kernel requires and consequently, the faster the algorithm executes. Using the common FIR filter implementation with a single MAC and two parallel moves as an example, the Instructions/Sample is (1)*(N)/1 = N where N is the number of taps in the filter. The number of moves per sample (moves/sample) is computed, as shown in Equation 1. × LoopPassesInAnIterationMemoryMoves ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- = MemoryMovesInABasicKernel NumberOfSamplesProcessedInAnIteration Sample (Eq. 1) The number of memory moves per sample is an indication of the bus bandwidth. For example, the most common FIR filter implementation is implemented with a single MAC and two parallel moves. This is (2) × (N) / 1 = 2N memory moves for each sample processed. In the context of this chapter, memory bandwidth is the number of moves rather than the number of bytes. The number of memory moves relates to the number of address generations required by the algorithm. 5.2 Assumptions This chapter makes the following assumptions: • The DSP kernels are highly optimized. • The supporting set-up code is not fully optimized and is written to be illustrative. • The number of samples processed and the number of coefficients in the filters are selected to keep the examples consistent. For different size filters, well-known techniques such as loop unrolling, zero padding, special passes and others, can be used but are not covered in this chapter. • C programs are of two types, one for illustrative purposes (to describe in C, as clearly as possible, the assembly code to be shown), the other is C code that demonstrates how the algorithm should be written, if the SC140 C compiler is to be used. The process of generating such code is iterative in nature: start with a multisample version of the algorithm then change it, if the result is satisfactory halt, if not change it again, and so on. 5-4 For More Information On This Product, Go to: www.freescale.com