Download dir - Oracle Documentation

Transcript
manner in which rounded values are accumulated can change the final value of sum.
The compiler performs this transformation only if you specifically give permission
for it to do so.
Speedups
If the compiler does not parallelized a portion of a program where a significant
amount of time is spent, then no speedup occurs. This is basically a consequence of
Amdahls Law. For example, if a loop that accounts for five percent of the execution
time of a program is parallelized, then the overall speedup is limited to five percent.
However, there may not be any improvement depending on the size of the workload
and parallel execution overheads.
As a general rule, the larger the fraction of program execution that is parallelized,
the greater the likelihood of a speedup.
Each parallel loop incurs a small overhead during start-up and shutdown. The start
overhead includes the cost of work distribution, and the shutdown overhead
includes the cost of the barrier synchronization. If the total amount of work
performed by the loop is not big enough then no speedup will occur. In fact the loop
might even slow down. So if a large amount of program execution is accounted by a
large number of short parallel loops, then the whole program may slow down
instead of speeding up.
The compiler performs several loop transformations that try to increase the
granularity of the loops. Some of these transformations are loop interchange and
loop fusion. So in general, if the amount of parallelism in a program is small or is
fragmented among small parallel regions, then the speedup is less.
Often scaling up a problem size improves the fraction of parallelism in a program.
For example, consider a problem that consists of two parts: a quadratic part that is
sequential, and a cubic part that is parallelizable. For this problem the parallel part
of the workload grows faster than the sequential part. So at some point the problem
will speedup nicely, unless it runs into resource limitations.
It is beneficial to try some tuning, experimentation with directives, problem sizes
and program restructuring in order to achieve benefits from parallel C.
Chapter 4
Parallelizing Sun ANSI/ISO C Code
111