Download StarCore SC140 Application Development Tutorial

Transcript
Freescale Semiconductor, Inc.
Multisample Programming Techniques
On the first iteration of the kernel, quad data values are loaded, starting from a double even address. This
does not create an alignment problem. However, at the end of the first iteration, the pointer is backed up
one, to delete the oldest sample. On the next iteration, the pointer is not at a double even address and the
quad data load is not aligned.
A solution to the alignment problem is to reduce the number of operands moved on each data bus. This
eases the alignment issue. However, to maintain the same operand bandwidth, each loaded operand must
be used multiple times.
Freescale Semiconductor, Inc...
This is a situation in which multisample processing is useful. As the number of samples per iteration
increases, more operands are reused and the number of moves per sample is reduced. With fewer moves
per sample, the number of memory loads is decreased, allowing fewer operands per bus and the data to be
loaded with fewer restrictions on alignment.
5.1.1 Computing Memory Bandwidth and Computation Time
Determining memory bandwidth and computation time (instructions) is not obvious because kernels may
compute multiple samples simultaneously. The number of instructions per sample (ins/sample) is
computed, as shown below:
Instructions
InstructionsInABasicKernel × LoopPassesInAnIteration
-------------------------------- = ------------------------------------------------------------------------------------------------------------------------------------------------------Sample
NumberOfSamplesProcessedInAnIteration
The number of instructions per sample is a direct measure of computation time. The lower this number,
the fewer instructions that the kernel requires and consequently, the faster the algorithm executes. Using
the common FIR filter implementation with a single MAC and two parallel moves as an example, the
Instructions/Sample is (1)*(N)/1 = N where N is the number of taps in the filter. The number of moves
per sample (moves/sample) is computed, as shown in Equation 1.
× LoopPassesInAnIterationMemoryMoves
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- = MemoryMovesInABasicKernel
NumberOfSamplesProcessedInAnIteration
Sample
(Eq. 1)
The number of memory moves per sample is an indication of the bus bandwidth. For example, the most
common FIR filter implementation is implemented with a single MAC and two parallel moves. This is
(2) × (N) / 1 = 2N memory moves for each sample processed. In the context of this chapter, memory
bandwidth is the number of moves rather than the number of bytes. The number of memory moves relates
to the number of address generations required by the algorithm.
5.2 Assumptions
This chapter makes the following assumptions:
• The DSP kernels are highly optimized.
• The supporting set-up code is not fully optimized and is written to be illustrative.
• The number of samples processed and the number of coefficients in the filters are selected to keep the
examples consistent. For different size filters, well-known techniques such as loop unrolling, zero
padding, special passes and others, can be used but are not covered in this chapter.
• C programs are of two types, one for illustrative purposes (to describe in C, as clearly as possible, the
assembly code to be shown), the other is C code that demonstrates how the algorithm should be
written, if the SC140 C compiler is to be used. The process of generating such code is iterative in
nature: start with a multisample version of the algorithm then change it, if the result is satisfactory halt,
if not change it again, and so on.
5-4
For More Information On This Product,
Go to: www.freescale.com